| Rosetta 3.3 Release Manual |
Last updated July 24, 2011 ; PI: Ora Schueler-Furman oraf@ekmd.huji.ac.il.
src/apps/public/flexpep_docking/FlexPepDocking.cc protocols/flexpep_docking/FlexPepDockingProtocol.cc test/integration/tests/flexpepdock/) and demo folder (rosetta_demos/FlexPepDock_Refinement). protocol_capture/FlexPepDock_AbInitio. The README file contains all the information that is needed in order to make a new run on a query peptide-protein interaction.The main references for the FlexPepDock Refinement protocol and the FlexPepDock ab-initio protocol include additional scientific background, in-depth technical details about the protocols, and large-scale assessment of their performance over a large dataset of peptide-protein complexes:
Refinement protocol:
Raveh B*, London N* and Schueler-Furman O Sub-angstrom Modeling of Complexes between Flexible Peptides and Globular Proteins. Proteins, 78(9):2029-2040 (2010).
ab-initio protocol:
Raveh B, London N, Zimmerman L and Schueler-Furman O Rosetta FlexPepDockab-initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors. PLoS ONE, 6(4): e18934 (2011).
A wide range of regulatory processes in the cell are mediated by flexible peptides that fold upon binding to globular proteins. The FlexPepDock Refinement protocol and the FlexPepDock ab-initio protocols are designed to create high-resolution models of complexes between flexible peptides and globular proteins. Both protocols were benchmarked over a large dataset of peptide-protein interactions, including challenging cases such as docking to unbound (free-form) receptor models (see References).
Refinement vs. ab-initio protocol:
Refinement protocol: The input to the protocol is an initial coarse model of the peptide-protein complex in PDB format (approximate backbone coordinates for peptide in the receptor binding site). Initial side-chain coordinates (such as the crystallographic side-chains of an unbound receptor) can be optionally provided as part of the input model. A preliminary step in the Refinement protocol involves the pre-packing of the input structure, to remove internal clashes in the protein monomer and the peptide (see prepack mode below). In the main part of the protocol, the peptide backbone and its rigid-body orientation are optimized relative to the receptor protein using the Monte-Carlo with Minimization approach, in addition to on-the-fly side-chain optimization. An optional low-resolution (centroid) pre-optimization may improve performance further. The main part of the protocol is repeated k times. The output models are then ranked by the user based on their energy score. The Refinement protocol is described in detail in the Methods section in Raveh, London et al., Proteins 2010 (see References).
ab-initio protocol: The input to the ab-initio protocol is: (1) A model of the peptide-protein complex in PDB format similar to the Refinement protocol, but starting from arbitrary (e.g., extended) peptide backbone conformation. It is required that the peptide is initially positioned in some proximity to the true binding pocket, but the exact starting orientation may vary. A preiminary step for the ab-initio protocol is the generation of fragment libraries for the peptide sequence, with 3-mer, 5-mer and 9-mer fragments (these can be generated automatically via a script from the starting structure, as shown in protocol_capture/FlexPepDock_AbInitio/ demo files). Another preliminary step is pre-packing, as in the Refinement protocol. The first step in the main part of the protocol involves a Monte-Carlo simulation for de-novo folding and docking of the peptide over the protein surface in low-resolution (centroid) mode, using a combination of fragment insertions, random backbone perturbations and rigid-body transformation moves. In the second step, the resulting low-resolution model is refined with FlexPepDock Refinement. As in the independent refinement protocol, the output models are then ranked by the used based on their energy score, or also subjected to clustering for improved performance. Our ab-initio protocol is described in detail in the Methods section in Raveh, London, Zimmerman et al., PLoS ONE 2011 (see References).
For more information, see the following tips about correct usage of FlexPepDock.
FlexPepDock requires the following inputs:
test/integration/tests/flexpepdock/input/1ER8_rb1_tor10_5.pdb for an example for an initial structure in Refinement mode, or protocol_capture/FlexPepDock_AbInitio/input_file/2b1z.start.pdb for ab-initio mode. The exact way in which the starting conformation is created may vary depending on the specific application. For example, if similar structures exist (this is common in peptide binders with multiple specificityity, as in PDZ domains and many signal peptides), the initial structure can be constructed from an homology model of a similar structure using the Rosetta tool for comparative modeling, or any other homology modeling tools. If only the binding site is known, the initial peptide chain can be created from a FASTA file using the BuildPeptide Rosetta utility or using external tools such as PyMol Builder. The chain can then be positioned manually in the vicinity of the binding site (e.g., in an extended backbone conformation) using external tools like PyMol and Chimera. Alternatively, the peptide may be positioned manually in a completely arbitrary orientation relative to the receptor protein, and brought to the vicinity of the binding site using FlexPepDock or Rosetta docking application, together with a constraints file to position the peptide close to the specified binding site, using appropriate distance constraints.test/integration/tests/flexpepdock/input/1ER8.pdb for an example in Refinement mode, or protocol_capture/FlexPepDock_AbInitio/input_file/2b1z.native.pdb for ab-initio mode.protocol_capture/FlexPepDock_AbInitio/scripts/frags/shift.sh can be used to offset a fragment file. The protocol capture also enables automatic generation and offsetting of the entire fragment file (see README file). See example runs below in the Tips section.Note that the -flexpep_prepack and -flexPepDockingMinimizeOnly flags are mutually exclusive with respect to the -lowres_abinitio and -pep_refine, as they denote completely different modes of functionally (-pep_refine and -lowres_abinitio are commonly used together, for ab-initio peptide modeling followed by refinement).
| Flag | Description | Type | Default |
| -receptor_chain | chain-id of receptor protein | String | first chain in input |
| -peptide_chain | chain-id of peptide protein | String | second chain in input |
| -lowres_abinitio | Low-resolution ab-initio folding and docking model. | String | false |
| -pep_refine | Refinement mode. (equivalent to obsolete -rbMCM -torsionsMCM flags) | String | false |
| -lowres_preoptimize | Perform a preliminary round of centroid mode optimization before Refinement. See more details in Tips. | Boolean | false |
| -flexpep_prepack | Prepacking mode. Optimize the side-chains of each monomer separately (without any docking). | Boolean | false |
| -flexpep_score_only | Read in a complex, score it and output interface statistics | Boolean | false |
| -flexPepDockingMinimizeOnly | Minimization mode. Perform only a short minimization of the input complex | Boolean | false |
| -ref_startstruct | Alternative start structure for scoring statistics, instead of the original start structure (useful as reference for rescoring previous runs with the flexpep_score_only flag.) | File | N/A |
| -peptide_anchor | Set the peptide anchor residue manually. It is recommended to override the default value only if one strongly suspects the critical region for peptide binding is extremely remote from its center of mass. | Integer | Residue nearest to the peptide center of mass. |
More information on common Rosetta flags can be found in the relevant rosetta manual pages). In particular, flags related to the job-distributor (jd2), scoring function, constraint files and packing resfiles are identical to those in any other Rosetta protocol).
| Flag | Description |
|
-in:file:s
Or -in:file:silent | Specify starting structure (in:file:s for PDB format, in:file:silent for silent file format). |
|
-in:file:silent_struct_type -out:file:silent_struct_type | Format of silent file to be read in/out. For silent output, use the binary file type since other types may not support ideal form |
-native | Specify the native structure for which to compare in RMSD calculations. This is a required flag. When the native is not given, the starting structure is used for reference. |
| -nstruct | Number of models to create in the simulation |
| -unboundrot | Add the position-sepcific rotamers of the specified structure to the rotamer library (usually used to include rotamers of unbound receptor) |
| -use_input_sc | Include rotamer conformations from the input structure during side-chain repacking. Unlike the -unboundrot flag, not all rotamers from the input structure are added each time to the rotamer library, only those conformations accepted at the end of each round are kept and the remaining conformations are lost. |
| -ex1/-ex1aro -ex2/-ex2aro -ex3 -ex4 | Adding extra side-chain rotamers (highly recommended). The -ex1 and -ex2aro flags were used in our own tests, and therefore are recommended as default values. |
| -database | The Rosetta database |
| -frag3 / -frag5 / -frag9 | 3mer / 5mer / 9mer fragments files for ab-initio peptide docking (9mer fragments for peptides longer than 9).
|
pre-pack your initial complex
FlexPepDocking.{ext}
-database ${mini_db} -s input.pdb -native native.pdb -flexpep_prepack
-ex1 -ex2aro [-unboundrot unbound.pdb]
generate 100 (or more) models with the -lowres_preoptimize flag, and additional 100 models (or more) without this flag, by two separate runs (the low resolution can be skipped if you are in a hurry)
FlexPepDocking.{ext}
-database ${minidb} -s start.pdb -native native.pdb
-out:file:silent decoys.silent -out:file:silent_struct_type binary
-pep_refine -ex1 -ex2aro -use_input_sc
-nstruct 100 -unboundrot unbound_receptor.pdb [ -lowres_preoptimize ]
Open the output score file of both runs (score.sc by default), sort it by model score (second column), and choose the top-scoring models as candidate models.
protocol_capture/FlexPepDock_AbInitio/README contains all information on how to automate this process. For manual runs, the following is needed: Create your initial complex structure (see Input files section for more information).
Pre-pack your initial complex as in FlexPepDock Refinement
Prepare 3-mer, 5-mer and 9-mer fragment files for the peptide using the fragment picker, as in any other Rosetta application (fragment libraries are not required for the receptor).
Assuming the receptor chain precedes the peptide chain, offset the indexing of the fragment file to account for it. In UNIX, this can be done by running the following sequence of commands:
set ifragfile=<input frag file name>
set ofragfile=<output frag file name>
set nResReceptor=<# receptor residues>
awk '{if ( substr ( $0,1,3 ) == "pos" ) {print substr ( $0,0,18 ) sprintf ("%4d",substr ( $0,19,4 ) + '"$nResReceptor"' ) substr ( $0,23,1000 ) ; } else {print ; }}' $ifragfile > $ofragfile
Generate 50,000 (or other number of choice) output models using the FlexPepDock ab-initio protocol:
FlexPepDocking.{ext}
-database ${minidb} -s start.pdb -native native.pdb
-out:file:silent decoys.silent -out:file:silent_struct_type binary
-lowres_abinitio -pep_refine -ex1 -ex2aro -use_input_sc
-frag3 <frag3 file> -frag5 <frag5 file> -frag9 <frag9 file>
-nstruct 50000 -unboundrot unbound_receptor.pdb
You may rank the model according to the default score (second column in score file). However, our benchmarks indicate that ranking the models according to a new score, called rewighted-score, may be helpful (look for the column labeled "reweighted_sc" in the score file).
We also found that clustering of the top-500 models using the Rosetta clustering application and choosing the clusters with lowest-energy representatives is helpful, and that good solutions are often found within the top 1-10 clusters. Clustering can be done (in UNIX) using the script protocol_capture/FlexPepDock_AbInitio/scripts/clustering/cluster.sh, assuming the models are stored in a silent file, as follows.
cluster.sh pdb-id 500 2 <scorefile> <reference-pdb> <models-silent-file> <score-type-column>
The last parameter is the column number of the score according to which you wish to choose the top-500 models. We recommend using the column labeled "reweighted_sc" for this, as described above.
The output of a FlexPepDock run is a score file (score.sc by default) and k model structures (as specified by the -nstruct flag and the other common Rosetta input and output flags). The score of each model is the second column of the score file. Model selection should be made based on either the score or reweighted-score columns (which exhibited superior performance in the ab-initio benchmarks).
Interpretation of FlexPepDock-specific score terms: (for the common Rosetta scoring terms, please also see the score-types manual page).
| total_score* | Total score of the complex |
| reweighted_sc* | Reweighted score of the complex, in which interface residues are given double weight, and peptide residues are given triple weight |
| I_bsa | Buried surface area of the interface |
| I_hb | Number of hydrogen bonds across the interface |
| I_pack | Packing statistics of the interface |
| I_sc | Interface score (sum over energy contributed by interface residues of both partners) |
| pep_sc | Peptide score (sum over energy contributed by the peptide to the total score; consists of the internal peptide energy and the interface energy) |
| I_unsat | Number of buried unsatisfied HB donors and acceptors at the interface. |
| rms (ALL/BB/CA) | RMSD between output model and the native structure, over all peptide (heavy/backbone/C-alpha) atoms
|
| rms (ALL/BB/CA)_if | RMSD between output model and the native structure, over all peptide interface (heavy/backbone/C-alpha) atoms |
| startRMS(all/bb/ca) | RMSD between start and native structures, over all peptide (heavy/backbone/C-alpha) atoms |
*For all interface terms, the interface residues are defined as those whose C-Beta atoms (C-Alpha for Glycines) are up to 8A away from any corresponding atom in the partner protein
Except for model selection by total score or reweighted score, and possibly clustering (see Tips section), no special post-processing steps are needed. For the ab-initio protocol, the protocol capture README file in protocol_capture/FlexPepDock_AbInitio/ contains all the information needed for clustering. However, advanced users may optionally use Rosetta Cluster commands directly for assessing whether top-scoring models converge to a consensus solution. For FlexPepDock Refinement, clustering is an optional step, and is not considered an integral part of the Refinement protocol, as described and tested in Raveh et al.
1.7.4