The docking search can be simplified in
many cases when there is biological information to indicate the general
region of binding on one or both partners. There are three major
ways to reduce the docking search space:
1) Pre-orient the suspected binding
site toward the other partner. This means you must design your
starting structure intelligently (see
preparing
PDB files), but then you can -dock_pert that
partner rather than randomizing it. This places the least
constraint on docking, so it is the safest and generally the smartest
approach to limit the search space.
2) If the protein is multidomain and biological information strongly
indicates that only one of the domains binds the other partner, then
reduce that docking partner to the high-affinity domain. This
must be done with great caution. Read the biological information
carefully and do not remove portions of the protein that might interact
with the other partner. Consider the size of the two partners and
be sure to leave a large enough portion of the trimmed partner to bind
the entire width of the other partner.
Note: this also introduces
another problem. By trimming the protein, you have created a
molecule that does not exist in reality. Portions of the molecule
that were previously buried will become hydrophobic surfaces that might
attract the other binding partner. For this reason it is
appropriate to set a
repulsive site
constraint on newly exposed surfaces so docking does not occur
there.
3) If binding sites are known on both partners, set a loose
distance constraint (25-30A) between the
two binding sites.
If you need to review docking flags, see
details
of docking flags.
The three key flags here are '-dock_pert', '-spin', and '-randomize.'
In the default case, where you want to put no constraints on either
docking partner, use '-randomize1 -randomize2' as search flags.
If you have pre-oriented partner 1 but wish to place no constraints on
docking partner 2, you will still want to perturb partner 1. The
way to do this is to use '-dock_pert 5 15 25 -randomize2'. The
numbers on -dock_pert are arbitrary but it is better to use larger
numbers for a blind search than for a refinement run.
If you have pre-oriented both partners 1 and 2, you still want to
perturb both partners and spin around the line of centers because you
do not usually know the orientation of the two patches relative to one
another. The search flags in this case should be '-dock_pert 5 15
25 -spin'
If you have not already done so, read
Using
condor to run RosettaDock before you try to a global run.
This method is the creation of Ora Furman at the University of
Washington. With no prior knowledge of the complex structure, it
is good to generate 5K-10K decoys and then look for low-scoring
outliers. The physical basis for this is that the vast majority
of orientational configurations will generate an unfavorable
interaction between the partners, while the native orientation, if it
is found, should stand out from the cloud.
Technically, this is done by simply generating 5000 decoys according to
the protocol used for refinement but with the appropriate flags for a
blind run as described above.
I suggest making a new config file called cal.config:
cp
$rosetta_scripts/condor_scripts/test.config cal.config
Then, change the following parameters:
prefix='ZZ'
nstruct=5000
search_flags='-dock_pert 5 15
25 -spin' (or the proper flag as described in the previous
section)
Njobs=100
This example is already provided in test.config (commented out).
Then, run crun.bash
crun.bash
test cal
Condor should launch directly.
A pictorial illustration of a blind 5000-decoy run is given for the
Baker lab's (Ora Furman) prediction of T12. The following is a score
vs. rmsd plot for 10,000 initial decoys from
this run:
Remember that rmsd has no meaning in this casesince
the native is not defined; rather, it is just a way to spread the
points out to identify outliers.
In this case, the method identifies a group of points around (19,-230)
that stands out clearly from the cloud. Ora then refined these
decoys and submitted the results to CAPRI. This point turned out
to be very accurate, within 1.0A rmsd or a high-accuracy prediction by
CAPRI standards.
Refine the outlier(s)
If this method identifies clear outliers (I would say the case above is
clear), the next step is to refine the outlier structures as shown in
the previous tutorial,
refinement runs.
You should test for a binding funnel around that location. If the
refinement produces a better-scoring structure than the outlier, than
use the refined structure as your model.
Please note that only the initial random search combined with the
refinement can reliably identify correct answers. The
presence of a strong binding funnel in the refinement stage is a strong
case that the decoy is a plausible model; the identification of
outliers in the first stage is not strong evidence by itself.
If this method
fails to find
an outlier, the next step to try is to generate more
decoys to improve the sampling. If after ~20K decoys, no outliers
are found, it is best to use
method 2
below.
The logic of method 2 is to overwhelm the sampling problem with
hundreds of thousands of decoys and then to analyze only the very best
(top 1%) of these. This is technically challenging because 100K+
decoys requires a lot of disk space, so we must get around the problem
by using a 'smart scorefilter' which checks decoys against a reference
score and keeps only the best 1 percent.
This protocol must be done in two steps. The first is a
calibration run of 5000, which serves as a reference set for the 100K
run. You would have generated the calibration run in the course
of trying method 1, so just use this as your calibration set.
Then you will perform the 100K run.
If you have not already done a calibration run, make and launch the
config file 'cal.config' as described above.
Starting
with cal.config,
create a new condor config file called 'global.config'. You will
want to keep the search flags exactly the same as in cal.config.
Make the following changes from 'cal.config' to 'global.config':
prefix='GS' (do not use 'ZZ' - this will mix calibration and
global run decoys)
nstruct=1000
scorefilter='smart_scorefilter 0.01' (This option is provided for
you in test.config)
search_flags and
Njobs are the same as cal.config.
Note: The
'-smart_scorefilter' flag is very important. This is how you
generate 100K decoys but only save 1,000 of them. This is the
only case in which you want to use this flag. It will only work
after the calibration has been run; otherwise there is no way to
calculate a 1% reference score.
Now, launch the global script:
crun.bash
test global
After this run, you will have 1000 decoys in the directory 'GS'.
After this, you will want to extract the top 200 of these 1000 and
cluster them by rmsd. A set of post-processing scripts is
provided in $rosetta_scripts/docking for this purpose. The
protocol for post-processing is described in
post-processing
large scale runs.