Refinment with RosettaDock

Back to index

Introduction

    In the RosettaDock basics tutorial, you learned how to generate small numbers of decoys with RosettaDock and how to set algorithm parameters such as flags and constraints for special cases.  Now you will begin learning how to do large-scale runs.

    Refinement is the first case of a large scale run.  It involves sampling around a binding site in a crystal structure or possibly a decoy in order to gain insights about the nature of the energy landscape around the binding site and in the theoretical case, the quality of the model.   Such sampling requires about 1000 decoys.

When do you want to do a refinment run?

Several cases exist for doing a refinment run, otherwise known as a dock-perturbation.

1) Given a crystal structure, what does the energy landscape look like around the native structure?

2) Given a crystal structure of protein A - protein B, predict a plausible model for a structurally related complex:
a) Mutate residue X of protein A at the A-B interface and refine the structure to compensate for structural effects of the mutation.
b) Predict the complex between a close homologue of protein A (>= 50% sequence identity) and protein B.  Superimpose the homologue of A onto the position of protein A in the complex and refine the complex to compensate for structural differences between protein A and the homologue.

3) Refine a low-resolution model for a complex and test the plausibility of the low-resolution model.

How do you do a refinement run?

Generating 1000 decoys takes a great deal of time on a single-processor machine, so generally a cluster or other multiprocessor system is needed.  We use a 56-processor machine with a common filesystem for all the nodes, and we use the queuing system Condor to manage jobs.  Because RosettaDock generates large numbers of independent simulations, no explicit communication between simulations is needed.  Only a common filesystem for the nodes of a cluster is needed.

See Using Condor to run RosettaDock for information on doing large scale runs on a condor cluster.  This should work on the Gray lab cluster jazz and on the Baker lab clusters.  Hereafter, I will periocidally reference terms from the 'Using Condor' page, so familiarize yourself with that page before reading further.

Rosetta flags:

How to run a perturbation run is described on the 'Using condor' page.  Briefly, I suggest using:

prefix='PT'
nstruct=1000
Njobs=100

and leave all the other flags in the condor config file alone.

If you need to, review more on docking flags for when it is appropriate to use -dock_pert, -spin, -randomize, and other such search flags.

How do you analyze the data from a refinement run?

To analyze the 1000 decoys from a dock-perturbation run, do a score vs. rmsd or "binding funnel" plot from the score file.  Try doing a perturbation run on the samplerun pdb from the basics tutorial and plotting the scorefile ('.fasc) that results.  Basically, plot the rms column of the scorefile as the x-axis and the score column as the y-axis.

To plot a scorefile in gnuplot, open up gnuplot and then type

plot "ZZaa11.fasc" u 3:2

This plots the scorefile "ZZaa11.fasc" with rms (column 3) as the x axis and score (column 2) as the y axis.  Use set xrange [:] and set yrange [:] commands to control the dimensions of the plot.

If you know R, you can plot a score file in R.  R can load the score file into a data frame, which makes it possible to plot columns as objects.  You can also easily analyze the columns of the score file in R.

After plotting score vs. rmsd, two kinds of results are possible:

1) Absence of funnel.  Poor correlation between score and rms, low density near native structure.  If your native is a crystal structure, it means that RosettaDock is not effectively discriminating the native from the non-native structure.  If your native structure is a plausible model, then assuming that RosettaDock can accurately discriminate the true native from false structures, this means that your plausible model probably does not represent the native structure.

<bad funnel plot>

x is rmsd and y is score.

2) Presence of funnel.  The funnel shown below is a very good case; not all RosettaDock funnels are quite this sharp.  Good correlation between score and rms, high density in native region.  If your native structure is a crystal structure, such a funnel means that RosettaDock is accurately discriminating the crystal structure from decoys.  If your native is a RosettaDock model, then this means your plausible model could represent the native structure.

<good funnel>

x is rmsd and y is score.

Using refinement for structure prediction

In cases 2 and 3 in "When do you want to do a perturbation run," the goal is to refine a known near-native structure. 

Case 2:  If you find a strong funnel from your perturbation, then the best choice for the refined model is at the bottom of the most densely populated potential well (in the good funnel above there is only one potential well and it is right at the native position).  Because your starting structure probably has some clashes from the mutation or homology model, the lowest potential well is likely to be between 0A and 5A rmsd.  This means you have probably relaxed from a position that had clashes.  The lowest-scoring structure in the bottom of the lowest potential well is your best pick.

Case 3:  If your plausible model produces a strong funnel, you are doing well.  Your best refined model is either the original plausible model or the lowest-scoring structure in the bottom of the lowest potential well, whichever has the lowest total score.

Summary

Now you have seen some graphical illustrations of how Rosetta samples and scores decoys.  You have learned how to use score vs. rmsd plots to pick out the most plausible model in simple cases.  This is the simplest form of structure prediction.

Often, however, you want to predict a complex in a case where the answer is not known.  These blind predictions are both the most intriguing and the most useful predictions of Rosetta. 

Next:  blind predictions with RosettaDock

Back to index