Simple Protocol Walkthrough

Metadata

Walkthrough by Steven Lewis, code by Doug Renfrew and Steven Lewis

Through The Simple Protocol

This file is a walkthrough of a simple Rosetta3 protocol (the protocol itself is at FILE). The code compiles without problem in the Rosetta3 release, although it is not a supported application. Inserting it into SCons is left as an exercise for the reader. This document assumes you know basic C++ stuff like what the int main() function is. Basically, what you should get from reading this file is an understanding of what the main sections of this sample protocol do, along with some insights as to why it works that way and some alternatives. This code is not meant to be ivory-tower perfect, but it is an example of what you can assemble, quickly(!), with some basic C++ hacking skills and the Rosetta API.

The protocol itself is a tool built around PDB 1KNE. This is a small protein bound to a short peptide. The protocol can:

The ultimate purpose of the protocol is designing new binding peptides for the target peptide (using noncanonical amino acids; this latter part is not supported in this release).

Let us start from the code

The code is broken up into sections with big shiny headers to let you know what's going on. This document will reference those sections and briefly explain what's happening. The code is well organized to just read straight through it from the top, so we'll start there. First there's your friendly copyright stuff. Next are the includes. Most files in rosetta are organized into “unit headers”, “project headers” and “utility headers”. First are headers vital to this specific file (for MyFile.cc, this is where MyFile.hh is included). This protocol needs no header file so we don't see any of those. Project headers include things from the Rosetta libraries, usually organized into core library and protocol library inclusions. Sometimes the Movers are treated as a special case, especially when writing protocols/applications like this (you will include a LOT of Movers, so they get grouped). Note that MonteCarlo is in with the movers (this is poor organization).

Next are utility and C++ headers. Utility/numeric headers provide the options system, Rosetta's replacements for some standard C++ functionality (tracers for cout, vector1 for vectors).

This file has a large pile of “using namespace” calls next. These are a stylistic choice.

The next block of code is a pile of OptionKey objects. These are “local” options. The option system (src/core/options/options_rosetta.py) is accessible from anywhere in Rosetta, but sometimes you'll want to add more options to this system for one specific application. These lines of code creating OptionKeys are the way to do it. Note that you CANNOT access these options from any file other than the one where they are created – the options system exists as a static global object and things break if you try to create options in library code from places other than the main option system.

There is a helper function declaration next – the implementation is at the bottom of the file. This code demonstrates how to set up a fold tree compatible first with docking, and then with loop modeling on top of that. The FoldTree is a directed acyclic graph that tells Rosetta how to fold the pose – how to convert internal coordinates into 3D coordinates – for all the nontrivial (non-peptide) cases. Regular swathes of peptide bond are treated with “peptide edges”, these are the edges marked -1 in this code. The other lines of code are determining where the more complex FoldTree connections should go, and then setting them up (for example, the connection between the protein & peptide, or the connections that allow loop modeling).

Next we'll move back up to the start of the main() function. The first calls are all option.add() calls. These add the locally created options to the option system. The next call is core::init. This should be the first call in your executeable's main function (unless you have option.add calls to make, in which case, immediately after them). This function reads the command line into the options.

The next few lines create objects for the protocol to use – a scorefunction (score12), a pose filled from a PDB, and a loops object.

We next move into the perturbation phase, where we'll create the movers that do perturbation. Note that this code is concerned with CREATING the movers, not using them – no changes are made to the pose! The MonteCarlo object gets made, then a RigidBodyPerturbMover (for docking). In both these cases, note the syntax for creating objects – ObjectOP my_object( new Object(ctor stuff));. This is common syntax. The OP system will take care of deletion for you so you do not need (and should NOT use) the delete keyword to clobber those OPs manually.

Loop setup is the most complicated part of the perturbation setup. This is CCD-style loop closure, which entails both loop breakage followed by mending with CCD. The MoveMap objects describe what parts of the pose can move, the SmallMover objects do the loop breaking, and then the CcdLoopClosureMover objects re-close the loop into a new conformation. We want to be able to call small + CCD in that sequence, so we package them into a SequenceMover. When we call the SequenceMover, it will call its two submovers in order. This code explicitly assumes two loops – this is true for the system at hand, although a poor assumption in general.

The peptide and termini setup involves small and shear moves, with examples of how to command-line control every little detail. There is no preference for small and shear moves here, so they are packaged into a RandomMover which will choose between them each time it runs.

Next, we set up RotamerTrials. First we need to describe what parts of the protein we want to modify – we do this with a TaskFactory. TaskFactory objects create PackerTask objects, which then tell the packing machinery what to do. You can create PackerTasks directly, but it's often better to use a Factory because A) it has more tools, and B) PackerTasks are meant as one-use disposable objects, and if you apply an old PackerTask to a new pose, your new pose's sequence will get changed to the sequence used to create the old PackerTask. We add TaskOperations to the TaskFactory to tell it how to build our task (the common ones are initialize from command line, read from resfile, and sometimes restrict to repacking). Once we have this we create a RotamerTrials mover to do rotamer trials.

The perturbation setup so far has been mostly necessary – this is the only way to get the movers set up. The next step is packaging them up for use under MonteCarlo. This is optional, you could call the movers directly yourself instead of using the TrialMover setup here. Anyway, most of our movers are set into a RandomMover which allows for random remodeling of a different aspect of the pose each perturbation cycle. We set this RandomMover in with a SequenceMover so that we do RotamerTrials after each perturbing move to fix sidechain clashes. This is generally a good idea: whenever you do a backbone perturbation, do a quick RotamerTrials (not a slow packing) to fix sidechain clashes. The last step, wrapping the sequence in a TrialMover, is optional. Basically the TrialMover handles the calls to MonteCarlo for you, for the benefits of MC searching through space. (You can interface with MonteCarlo directly if you want).

We next start setting up the design + minimization phase. Design setup loops like RotamerTrials setup, except that now we're doing packing instead of rotamer trials. Note that there is no restrict to repacking here. We build a minimization mover next. Minimization can affect all degrees of freedom in the pose's AtomTree. In our case, we want to minimize sidechains, but only the sidechains that are repackable (the others would waste time). We use TaskAwareMinMover to wrap the MinMover in this case. The 'TaskAware' part modifies the MoveMap for minimization by piping in PackerTask information before minimization occurs. Notice here we do not bother with MonteCarlo – rejecting the design step would leave us with our original sequence, which negates the whole point of scoring different peptide sequences. Your mileage may vary; you'll probably want MonteCarlo control here too.

The last part of this protocol is a mixture of protocol-specific stuff, and job-distribution stuff that could be elsewhere. The outer for loop goes over nstruct (how many models to make). The inner for loop is the perturbation phase (note it has pert_trial->apply()). We next recover the lowest energy model from perturbation and run the design phase once. The rest of this section is for pose IO to PDB, and resetting various variables between runs.

One big thing that you might choose to do differently in your protocol is to use a JobDistributor instead of having calls to PDB IO right in your code. The job distributor will provide your code with the IO. In exchange, you must wrap your code into the apply() function of a mover (meaning, write a new mover). Your mover can exist just in the .cc file for your application without trouble. (This code does not demonstrate the job distributor for development reasons; it was written with intent to use job distributor tools not being released in Rosetta 3.0). Most of the protocols in the src/apps/public directory demonstrate this organization, where 99% of the protocol occurs inside a mover, and the application itself just sets the mover up and asks the job distributor to run it.


Generated on Fri Mar 6 12:55:46 2009 for Rosetta Projects by  doxygen 1.5.2