Using RosettaDock++
Mike Daily

Back to index

Basics

As the Rosetta code changes, sometimes scripts and commands given in the tutorial will break.  If you find such a bug, please email docking-support@rosettacommons.org and we will update the tutorial accordingly.

Algorithm
Rosetta works by simultaneous optimization of side-chain conformation and rigid body position of the two docking partners.  The former is performed by a packing algorithm, and the latter is performed by a rigid-body Monte Carlo Minimization (MCM) strategy.  For more on the algorithm of Rosetta, see

Protein-Protein Docking with Simultaneous Optimization of Rigid Body Displacement and Side Chain Conformations, J.J. Gray, S.E. Moughan, C. Wang, O. Schueler-Furman, B. Kuhlman, C.A. Rohl and D. Baker, J. Mol. Biol., 331(1), 281-299 2003. pdf

Prepacking
Prior to docking, the sidechains of the native protein are removed and replaced using the Rosetta sidechain packing algorithm to prevent errors in docking due to irregularities (e.g. crystal contacts) in the native protein.

Docking
Depending on one's confidence on the native structure, and the amount of previous experimental information, different protocols will be used.  Sometimes biochemical and genetic information can be used to localize the binding site to a small region on one or both partners.   In this case, one performs perturbation run, exploring only a small region of space around the suspected binding site.   For predictions where there is no biological information about the interface, one usually performs a global search, exploring all the conformational space of both partners.  

A search involves in general two stages:

Low-Resolution search
The protein is represented as backbone plus centroid representation of sidechains, i.e. the sidechain is represented as one giant atom to save CPU time.  In this stage RosettaDock attempts to find the rough orientation of the two docking partners for the high-resolution search. This results in a complex of contacting, non-clashing partners.  

Full atom, High-Resolution search
All atoms in the protein are explicitly modeled, and the position found in the low-resolution search is optimized.  Rigid body MCM is alternated with sidechain repacking so that the sidechains can adjust to a new, more favorable orientation and vice versa.  The high-resolution stage uses up the most CPU time of Rosetta.

Terms

Pay careful attention to these terms, especially the differences between the different types of structures.
Setup for rosetta

1) We have created an automated setup file called 'rosettarc' to automatically set up all of the paths for RosettaDock.  It is located in the rosetta_scripts/docking directory.  You will need to copy it to your home directory and then source it in your .bashrc:

In rosetta_scripts/docking:

cp rosettarc ~/.rosettarc

The copy changes the name to .rosettarc so that when you run ls on your home directory, .rosettarc won't show up.

Now, in your home directory, edit '.rosettarc' so that the rosetta_root environment variable points to your parent rosetta directory (~/simcode, ~/rosetta, or ~/RosettaDock)

Then, open .bashrc and add the line:

source .rosettarc

Then, on the command line, type

source .bashrc  (you only need to do this once).

2) Compile the source code.

cd $rosetta_src (environment variable created by .rosettarc)
make -j2 gcc

You will get a file called rosetta.gcc, the rosetta executable.

In your rosetta_root direcory, make a directory called 'bin' (or wherever $rosetta_bin was set to in your .rosettarc)

cd $rosetta_root
mkdir bin
cd bin
ln -s $rosetta_src/rosetta.gcc

4) In the top-level Rosetta directory (rosetta or simcode) you will find the directory examples, or untar examples in a local directory.  If you are using the web version of this tutorial, download examples.tar to a local directory and untar it.  In examples, you will find a directory called samplerun.

This directory intially contains a directory the native pdb file test.pdb , the unbound pdb file test.unbound.pdb , and constraint file test.cst.   Finally, paths.txt contains information about the location of input and output files.  

Copy the following two scripts from $rosetta_scripts to samplerun ($rosetta_scripts and other such variables are environment variables initialized by .rosettarc to conveniently take you to key rosetta directories):

cd samplerun
cp $rosetta_scripts/ppk.bash .
cp $rosetta_scripts/farun.bash .

The meaning of these scripts, and the content of paths.txt will become more clear shortly.
Link the location of the rosetta_database to the current directory.

ln -s $rosetta_database .

5) I have put the native structure (test.pdb) and constraints (test.cst) in samplerun/pdb/ and samplerun/cst/ for you.   Later on I will tell you how to prepare your own pdb from scratch, since this is a little bit harder.  I have also put the unbound structure in (test.unbound.pdb) in samplerun/pdb/.  The unbound structure must have the 'unbound.pdb' extension so Rosetta will know it is the unbound structure.

Prepacking

1) Run the ppk.bash script to see the arguments it takes:

./ppk.bash

The script should work without any modifications in most cases.  It will invoke the following commandline:

$rosetta_exe aa test 1 -dock -s test -prepack_rtmin -quiet -ex1 -ex2aro_only -unboundrot

You may want to add extra flags like -fab1 or -norepack1 in certain restricted cases.  Extra flags can be added to the command line directly or by modifying the file.  I have put comments inside ppk.bash to explain the meanings of the different flags and how to change them.  In addition, a section later in the tutorial describes rosetta flags in more detail.

2) If everything works properly, a directory called "prepack" will be created and in it will be the prepacked structure, test.ppk.pdb ($pdb.ppk.pdb).  Also, a directory called "test.ppk" ($pdb.ppk) will be created.  Go into test.ppk and look at the scorefile (aatest.fasc).

Type the following command:

cut -c 1-39,261-296,316-323,405- aatest.fasc
filename                  score    rms   bk_tot   fa_atr   fa_rep   fa_sol  fa_dun  description
test.pdb                -165.36   0.00  -303.08  -607.97    44.51   299.35  206.28  native
test.pdb                -171.60   0.00  -424.31  -595.83    47.19   287.43   76.08  nat_repacked
test.pdb                -171.60   0.00  -424.31  -595.83    47.19   287.43   76.08  nat_min
test.pdb                -165.37   0.00  -303.09  -607.97    44.51   299.35  206.28  input
test.pdb                -171.38   0.00  -424.05  -595.31    48.33   287.40   74.77  inp_repacked
test.pdb                -171.38   0.00  -424.05  -595.31    48.33   287.40   74.77  inp_min
before.pdb                 9.27   0.00  -424.05  -595.31    48.33   287.40   74.77  start
away.pdb                -165.37  99.00  -303.09  -607.97    44.51   299.35  206.28  away_not_repacked
minimized_away.pdb      -176.44  99.00  -449.93  -602.11    28.71   289.67   79.98  minimized_away
test.ppk.pdb            -176.45   0.00  -449.33  -602.11    28.71   289.67   79.98  prepacked
test.remin.pdb          -176.45   0.00  -449.33  -602.11    28.71   289.67   79.98  re_rtmined
no_pdbfile              -176.45   0.00  -449.33  -602.21    28.71   289.78   79.98  output_decoy
This will show you useful columns from the scorefile.
native - original crystal structure
minimized_away - slide partners away from each other, then minimize side chain conformations with rotamer trial minimization.
prepacked - slide partners back in from minimized_away.

The score of prepacked will always be equal to or higher than that of minimized_away, since the optimal positions of the sidechains in the away form will be nonoptimal when the partners are together.

re_rtmined - minimize the interface residues in the prepacked structure.

Note that this protocol optimizes the individual monomer structures with the rotamer trial minimization protocol.   In previous versions, we used to create the starting structures by repacking the monomers.   This is the reason why they are called "prepacked", eventhough currently no actual "repacking" is involved.  

If we want to perform a docking run with the correct backbone structure, but without knowledge of the side chains, we should add the flag -prepack_full, which will invoke a full repacking of all side chains, followed by a rotamer trial minimization step.

If we want to use the prepacked structure for a docking run that does not involve any rotamer trial minimization, we should prepack the initial structure accordingly: use -prepack_full without the -prepack_rtmin option.

Now for the columns (some of which are not displayed above):
  • rep - van der Waals repulsion
  • atr - van der Waals attraction
  • sol - solvation
  • hbsc - sidechain hbonds
  • hb_srbb and hb_lrbb - short range (sr) and long range (lr) backbone H-bonds
  • dun - rotamer probability (Dunbrack) score
  • pair - residue pair potentials
To see all of these columns, type
cut -c 1-39,261-296,236-251,261-340,405- aatest.fasc

It is good to watch the scores and also the fa_rep.  The fa_rep is usually high for the prepacked structure, but if it is also high (>500) for the repacked away or reppk structures, this may indicate a clash problem in your native structure or an error in the way you ran the code.  There are some cases where you might expect this, for example if your native structure is a homology model or has a computationally generated mutation.  You should not expect a high fa_rep in the repacked_away form of a crystal structure, however.

Also, if you want to look at some of the structures, look in test.ppk/aa and use rasmol to view the structures.

Full atom run

Congratulations! You have finished prepacking.  Now you are ready for the full atom run.

./farun.bash test

This will invoke the following command line:

$rosetta_exe aa test 1 -dock -dock_mcm -dock_rtmin -quiet -nstruct 1 -fake_native -dock_pert 3 8 8 -spin -ex1 -ex2aro_only -unboundrot -s test.ppk

Look at the farun scorefile (test/aatest.fasc) and check it in the same way as the prepacking score file which I showed above.

This concludes the basic rosetta tutorial.  You have now set up your account for rosetta, prepacked, and performed a full atom run.  Continue reading below to find out about details of rosetta (special flags, constraint files, etc.), and how to do large scale runs with RosettaDock.

Making your own RosettaDock runs; special cases

Detailed description of docking flags
(Worth reading once you are comfortable with the basics)
Preparing your own PDBs for Rosetta (You will need this in the near future)

Constraints
Rosetta has two types of constraints:  site constraints and distance constraints.  These are meant to restrain the docking search in accordance with known biological information.  See details on using constraints for more information.

Antibodies
Docking antibodies is more complicated.  It is possible to dock antibodies just like any other protein complex, but it is better to use the rosetta antibody (-fab) mode.  Since much is known experimentally about which parts of antibodies do and do not bind the antigen, it is possible to make specific antibody constraints to make sure an antibody-antigen complex meets these experiemental rules.  If you need to use antibody mode, you will have to make a fab file.  See fab files and antibody docking by Kurt Piepenbrink for instructions.

Structure prediction with RosettaDock

RosettaDock has been designed to predict protein complexes from the independently determined structures of the docking partners.  In order to use the structure prediction features of RosettaDock, you need to learn how to run large-scale runs with RosettaDock.  I strongly recommend that you read the next tutorial, Structure prediction with Rosetta, which teaches how to do these kinds of runs.  It is not much more difficult than small-scale runs if you have the computing power.

Next:  refinement runs with RosettaDock
Back to index