Back
to index
Introduction
In the
RosettaDock
basics tutorial, you learned
how to generate small numbers of decoys with RosettaDock and how to set
algorithm parameters such as flags and constraints for special
cases. Now you will begin learning how to do large-scale runs.
Refinement is the first case of a large scale
run. It involves sampling around a binding site in a crystal
structure or possibly a decoy in order to gain insights about the
nature of the energy landscape around the binding site and in the
theoretical case, the quality of the model. Such sampling
requires about 1000 decoys.
When
do you want to do a refinment run?
Several cases exist for doing a refinment run, otherwise known as a
dock-perturbation.
1) Given a crystal structure, what does
the energy landscape look
like around the native structure?
2) Given a crystal structure of protein A - protein B, predict a
plausible model for a structurally
related complex:
a) Mutate residue X of protein A at the
A-B interface and refine the structure to compensate for structural
effects of the mutation.
b) Predict the complex between a close homologue of protein A (>=
50% sequence identity) and protein B. Superimpose the homologue
of A onto the position of protein A in the complex and refine the
complex to compensate for structural differences between protein A and
the homologue.
3) Refine
a low-resolution model for a complex and test the plausibility of the low-resolution
model.
How
do you do a refinement run?
Generating 1000 decoys takes a great
deal of time on a single-processor machine, so generally a cluster or
other multiprocessor system is needed. We use a
56-processor machine with a common filesystem for all the nodes, and we
use the queuing system Condor to manage jobs. Because RosettaDock
generates large numbers of independent simulations, no explicit
communication between simulations is needed. Only a common
filesystem for the nodes of a cluster is needed.
See
Using
Condor to run
RosettaDock for information on doing large scale runs on a
condor cluster. This should work on the Gray lab cluster jazz and
on the Baker lab clusters. Hereafter, I will periocidally
reference terms from the 'Using Condor' page, so familiarize yourself
with that page before reading further.
Rosetta flags:
How to run a perturbation run is described on the 'Using condor'
page. Briefly, I suggest using:
prefix='PT'
nstruct=1000
Njobs=100
and leave all the other flags in the condor config file alone.
If you need
to, review
more on docking flags for when
it is appropriate to use -dock_pert, -spin, -randomize, and other such
search flags.
How
do you analyze the data from a refinement run?
To analyze the 1000 decoys from a dock-perturbation run, do a score vs.
rmsd or "binding funnel" plot from the score file. Try doing a
perturbation run on the samplerun pdb from the basics tutorial and
plotting the scorefile ('.fasc) that results. Basically, plot the
rms column
of the scorefile as the x-axis and the score column as the y-axis.
To plot a scorefile in gnuplot, open up gnuplot and then type
plot
"ZZaa11.fasc" u 3:2
This plots the scorefile "ZZaa11.fasc" with rms (column 3) as the x
axis and score (column 2) as the y axis. Use
set xrange [:] and
set yrange [:] commands to control
the dimensions of the plot.
If you know R, you can plot a score file in R. R can load the
score file into a data frame, which makes it possible to plot columns
as objects. You can also easily analyze the columns of the score
file in R.
After plotting score vs. rmsd, two kinds of results are possible:
1) Absence of funnel. Poor correlation between
score and rms, low density near native structure. If your native
is a crystal structure, it means that RosettaDock is not effectively
discriminating the native from the non-native structure. If your
native structure is a plausible model, then assuming that RosettaDock
can accurately discriminate the true native from false structures, this
means that your plausible model probably does not represent the native
structure.
x is rmsd and y is score.
2) Presence of funnel. The funnel shown below is a very good
case; not all RosettaDock funnels are quite this sharp. Good
correlation between score and rms, high
density in native region. If your native structure is a crystal
structure, such a funnel means that RosettaDock is accurately
discriminating the crystal structure from decoys. If your native
is a RosettaDock model, then this means your plausible model could
represent the native structure.
x is rmsd and y is score.
Using
refinement for structure prediction
In cases 2 and 3 in "When do you want to do a perturbation run," the
goal is to refine a known near-native structure.
Case 2: If you find a strong funnel from your perturbation, then
the best choice for the refined model is at the bottom of the most
densely populated potential well (in the good funnel above there is
only one potential well and it is right at the native position).
Because your starting structure probably has some clashes from the
mutation or homology model, the lowest potential well is likely to be
between 0A and 5A rmsd. This means you have probably relaxed from
a position that had clashes. The lowest-scoring structure in the
bottom
of the lowest potential well is your best pick.
Case 3: If your plausible model produces a strong funnel, you are
doing well. Your best refined model is either the original
plausible model or the lowest-scoring structure in the bottom of the
lowest potential well, whichever has the lowest total score.
Summary
Now you have seen some graphical illustrations of how Rosetta samples
and scores decoys. You have learned how to use score vs. rmsd
plots to pick out the most plausible model in simple cases. This
is the simplest form of structure prediction.
Often, however, you want to predict a complex in a case where the
answer is not known. These blind predictions are both the most
intriguing and the
most useful predictions of Rosetta.
Next: blind predictions with RosettaDock
Back
to index