| Rosetta 3.3 Release Manual |
This document updates documentation written in 2008 by Rhiju Das (rhiju [at] stanford.edu) into the latest documentation format. Last update: April 2011.
The central code for the rna_denovo application is in src/protocols/rna/RNA_DeNovoProtocol.cc.
For a 'minimal' demo example of the RNA fragment assembly and full-atom minimization protocol and input files, see
test/integration/tests/rna_denovo/ [in the developer's SVN repository]
or
rosetta_demos/RNA_Denovo [in the public release]
Das, R. and Baker, D. (2007), "Automated de novo prediction of native-like RNA tertiary structures", PNAS 104: 14664-14669. [for fragment assembly]
Das, R., Kudaravalli, M., et al. (2007) "Structural inference of native and partially folded RNA by high throughput contact mapping", Proceedings of the National Academy of Sciences U.S.A 105, 4144-4149. [for modeling large RNAs with contraints]
Das, R., Karanicolas, J., and Baker, D. (2010), "Atomic accuracy in predicting and designing noncanonical RNA structure". Nature Methods 7:291-294. [for high resolution refinement]
Sripakdeevong, P., Kladwang, W., and Das, R. (2011), "Resolving a sampling bottleneck in biopolymer structure prediction: RNA loops by a stepwise ansatz", submitted. [for loop modeling]
(Preprints/reprints of these papers are available at http://daslab.stanford.edu/pubs.html).
This code is intended to give three-dimensional de novo models of single-stranded RNAs or multi-stranded RNA motifs, with the prospect of reaching high (near-atomic-resolution) accuracy.
The RNA structure modeling algorithm in Rosetta is based on the assembly of short (1 to 3 nucleotide) fragments from existing RNA crystal structures whose sequences match subsequences of the target RNA. The Fragment Assembly of RNA (FARNA) algorithm is a Monte Carlo process, guided by a low-resolution knowledge-based energy function. The models can then be further refined in an all-atom potential to yield more realistic structures with cleaner hydrogen bonds and fewer clashes; the resulting energies are also better at discriminating native-like conformations from non-native conformations. The two-step protocol has been named FARFAR (Fragment Assembly of RNA with Full Atom Refinement).
You need only one input file to run RNA structure modeling:
A sample command line is the following:
rna_denovo.<exe> -fasta chunk002_1lnt_.fasta -nstruct 2 -out::file::silent test.out -cycles 1000 -minimize_rna -database <path to database>
The code takes about 1 minute to generate two models.
The fasta file has the RNA name on the first line (after >), and the sequence on the second line. Valid letters are a,c,g, and u. The example fasta file is available in rosetta_source/test/integration/tests/rna_denovo/.
RNA motifs are typically ensconced within Watson/Crick double helices, and involve several strands. [The most conserved loop of the signal recognition particle is an example, and is included here as chunk002_1lnt_RNA.pdb.] You can specify the bounding Watson/Crick base pairs in a "params file" with lines like the following:
CUTPOINT_OPEN 6 [means that one chain ends at residue 6] STEM PAIR 1 12 W W A [means that residues 1 and 12 should form a base pair with their Watson-Crick edges in an antiparallel orientation]
and then run:
rna_denovo.<exe> -fasta chunk002_1lnt_.fasta -native chunk002_1lnt_RNA.pdb -params_file chunk002_1lnt_.prm -nstruct 2 -out::file::silent chunk002_1lnt.out -cycles 1000 -minimize_rna -database <path to database>
This command line also includes the "native" pdb, and will result in heavy-atom rmsd scores being calculated. Note again that the native pdb should have residues marked rA, rC, rG, and rU (see notes on {DB below). The code again takes about 1 minute to generate two models. Finally, there are some notes on forcing other kinds of pairs below [Can I specify non-Watson-Crick pairs?].
By default the RNA fragment assembly makes use of bond torsions derived from the large ribosome subunit crystal structure 1jj2, which have been pre-extracted in 1jj2. torsions (available in the database). If you want to use torsions drawn from a separate PDB (or set of PDBs), the following command will do the job.
rna_database.<exe> -vall_torsions -s my_new_RNA1.pdb my_new_RNA2.pdb -o my_new_set.torsions -database <path to database>
The resulting file is just a text file with the RNA's torsion angles listed for each residue. Then, when creating models, use the following flag with the rna_denovo application:
-vall_torsions my_new_set.torsions
Similarly, the database of base pair geometries can be created with rna_database -jump_library, and then specified in the rna_denovo application with -jump_library_file.
Required:
-in:database Path to rosetta databases. [PathVector]
-in::fasta Fasta-formatted sequence file. [FileVector]
Commonly used:
-out::file::silent Name of output file [scores and torsions, compressed format]. default="default.out" [String]
-params_file RNA params file name.[String]. For Example: -params_file chunk002_1lnt_.prm
-in::native Native PDB filename. [File].
-out::nstruct Number of models to make. default: 1. [Integer]
-minimize_rna High resolution optimize RNA after fragment assembly.[Boolean]
-vary_geometry Vary bond lengths and angles (with harmonic constraints near Rosetta ideal) for backbone and sugar degrees of freedom [Boolean]
Less commonly used, but useful
-cycles Number of Monte Carlo cycles.[default 10000]. [Integer]
-filter_lores_base_pairs Filter for models that satisfy structure parameters. [Boolean]
-output_lores_silent_file If high resolution minimizing, output intermediate low resolution models. [Boolean]
-dump Generate pdb output. [Boolean]
-vall_torsions Source of RNA fragments. [default: 1jj2.torsions]. [Boolean]
-jump_library_file Source of base-pair rigid body transformations if base pairs are specified.
[default: 1jj2_RNA_jump_library.dat] [String]
-close_loops Attempt closure across chainbreaks by cyclic coordinate descent after fragment moves [Boolean]
-cst_file Specify constraints (typically atom pairs) in Rosetta-style constraint file. [String]
-output_lores_silent_file if doing full-atom minimize, also save models after fragment assembly but before refinement (file will be called *.LORES.out) [Boolean]
-dump output pdbs that occur during the run, even if using silent file output.
Input and output PDB models have residues marked rA, rC, rG, and rU, due to historical reasons. If you have a "standard" PDB file, there is a python script available to convert it to Rosetta format:
demo/rna/make_rna_rosetta_ready.py <pdb file>
You can also specify base pairs that must be forced, even at the expense of creating temporary chainbreaks, in the params file, with a line like:
OBLIGATE PAIR 2 11 W W A
This also allows the specification of non-Watson-Crick base pairs. In the line above, you can change the W's to H (hoogsteen edge) or S (sugar edge); and the A to P (antiparallel to parallel). The base edges are essentially the same as those defined in the classification by Leontis & Westhof. The latter (A/P) are determined by the relative orientation of base normals. [The cis/trans classification of Leontis & Westhof would be an alternate to the A/P, but we found A/P more convenient to compute and to visually assess.] The base pairs are drawn from a library of base pairs extracted from the crystallographic model of the large ribosomal subunit 1JJ2.
When specifying pairs, if there are not sufficient CUTPOINT_OPEN's to allow all the pairs to form, the code will attempt to choose a (non-stem) RNA suite to put in a cutpoint, which can be closed during fragment assembly with the -close_loops option. If you want to pre-specify where this cutpoint will be chosen, add a line like
CUTPOINT_CLOSED 6
The most common question we get is on what the terms in the 'SCORE lines' of silent files mean. Here's a brief rundown, with more explanation in the papers cited above.
***Energy interpreter for low resolution silent output:
score Final total score
rna_rg Radius of gyration for RNA
rna_vdw Low resolution clash check for RNA
rna_base_backbone Bases to 2'-OH, phosphates, etc.
rna_backbone_backbone 2'-OH to 2'-OH, phosphates, etc.
rna_repulsive Mainly phosphate-phosphate repulsion
rna_base_pair_pairwise Base-base interactions (Watson-Crick and non-Watson-Crick)
rna_base_pair Base-base interactions (Watson-Crick and non-Watson-Crick)
rna_base_axis Force base normals to be parallel
rna_base_stagger Force base pairs to be in same plane
rna_base_stack Stacking interactions
rna_base_stack_axis Stacking interactions should involve parallel bases.
atom_pair_constraint Harmonic constraints between atoms involved in Watson-Crick base
pairs specified by the user in the params file
rms all-heavy-atom RMSD to the native structure
***Energy interpreter for fullatom silent output:
score Final total score
fa_atr Lennard-jones attractive between atoms in different residues
fa_rep Lennard-jones repulsive between atoms in different residues
fa_intra_rep Lennard-jones repulsive between atoms in the same residue
lk_nonpolar Lazaridis-karplus solvation energy, over nonpolar atoms
hack_elec_rna_phos_phos Simple electrostatic repulsion term between phosphates
hbond_sr_bb_sc Backbone-sidechain hbonds close in primary sequence
hbond_lr_bb_sc Backbone-sidechain hbonds distant in primary sequence
hbond_sc Sidechain-sidechain hydrogen bond energy
ch_bond Carbon hydrogen bonds
geom_sol Geometric Solvation energy for polar atoms
rna_torsion RNA torsional potential.
atom_pair_constraint Harmonic constraints between atoms involved in Watson-Crick base pairs
specified by the user in the params file
angle_constraint (not in use)
rms all-heavy-atom RMSD to the native structure
To get a score of an input PDB, you can run the 'denovo' protocol but ask there to be no fragment assembly cycles and no rounds of minimization:
rna_denovo.<exe> -database <path to database> -cycles 0 -minimize_rna -minimize_rounds 0 -s <your pdb file> -fasta <the sequence of the pdb> -out:file:silent SCORE.out
Then you can check the score in SCORE.out:
grep SCORE SCORE.out
If you take a PDB created outside Rosetta, very small clashes may be strongly penalized by the Rosetta all-atom potential. Instead of scoring, you should probably do a short minimize, run:
rna_denovo.<exe> -database <path to database> -cycles 0 -minimize_rna -s <your pdb file> -fasta <fasta with sequence of the pdb> -out:file:silent MINIMIZE.out
Then grep SCORE MINIMIZE.out
You will have to change the "-out:file:silent <file>" flag for each input file, or you will get a message that the job is already done. This is admittedly cumbersome; future releases will include a separate executable for minimizing.
You will typically use the protocol to produce a silent file -- how do you get the models out?
The models from the above run are stored in compressed format in the file test.out, along with lines representing the score components. You can see the models in PDB format with the conversion command.
rna_extract.<exe> -in:file:silent test.out -in:file:silent_struct_type rna -database <path to database>
Note that the PDBs have residue types marked as rA, rC, rG, and rU.
The code has not been changed since the first release (Rosetta 3.0), but the code was removed in release 3.3 because the documentation was not upgraded to the Rosetta community standards. Rosetta 3.4 onwards restores rna_denovo with proper documentation!
1.7.4