Using condor to run RosettaDock

Back to index

Setup

In order to do large-scale runs, we use a supercomputer with 56 processors and split up the runs accordingly.  The management software for the cluster is called Condor.  I will give you the basics about condor, but if you want to know more, you can look at the condor manual.

This tutorial assumes you have already configured and installed condor on your machine.  Our cluster is named 'jazz-mgmt' so if you see 'jazz-mgmt' in this tutorial, simply replace 'jazz-mgmt' with the system name of your cluster.

I suggest you first set up a parallel directory structure on your cluster's management node to that which you have on your home linux box.  You need to have rosetta_scripts, rosetta_database, and the executables (under bin in your rosetta parent directory) on your cluster.   Also copy .rosettarc into your home directory.  To do this use the Linux command scp.

scp has the following format:

scp(-r)sourcefile  destination_machine:destinationfilepath
Note that destination_machine and destinationfilepath are separated by a colon (:)

Use -r (recursive) if you are copying a directory

So to copy your rosetta_database directory from your machine to jazz-mgmt, do the following (from your home directory):

scp -r rosetta_database  jazz-mgmt:

Do this also for the other rosetta directories I mentioned earlier.

In this case, sourcefile is "rosetta", destination_machine is jazz-mgmt, and destinationfilepath is your home directory on jazz-mgmt (the default file path, hence it is not given).  Also copy over your .bashrc and .rosettarc, or again make the link on jazz-mgmt to the rosetta_scripts/docking/rosettarc file.

Condor config file

The condor config file contains all of the variables needed to run rosetta on condor, including both rosetta variables and condor variables.  I have provided an example, test.config, in condor_scripts.  This is a highly adaptable script that can be customized to do all kinds of rosetta runs on condor.  A condor launch script in $rosetta_scripts translates the config file into a condor script and a wrapper script to run rosetta, and the launch script then launches the condor jobs directly.

Read over 'test.config' for some helpful notes.

The condor config file contains the same groups of rosetta variables found in farun.bash, the rosetta run script from the basics tutorial, plus a condor variable.  Here are the major groups of flags:

1) prefix:  This is the pdb path for decoys (inside the pdb name directory).  You can set this to anything you want, but if you do more than one condor run in a directory, use different prefixes for the different runs.

2) nstruct:  the number of structures that Rosetta outputs.  For large scale runs, this is usually in the thousands.  As I discuss different types of large scale runs in the refinment and blind prediction tutorials, I will suggest numbers for nstruct.  Several examples are given in test.config.  nstruct cannot be bigger than 9999, otherwise Rosetta will overwrite files after decoy 9999 is produced.  The condor config file provides a way to run more than 10,000 structures.  To see how, see the "more than 10,000 structures" at the bottom.

3) search flags:  These control how the docking search is done, and they are the most important variables for large scale runs.  As I discuss different types of large scale runs in the refinment and blind prediction tutorials, I will suggest different combinations of search flags.  Several examples are given in test.config.

4) side chain flags:  If you want to change the way RosettaDock repacks sidechains.  If you are using a bound structure for partner 1, add -norepack1 here, and similarly for partner 2.  This is described in the condor config file.

5) smart scorefilter - I have this turned off in test.config.  Leave it turned off unless you are doing a 100K run (this is described in the blind runs tutorial).

6) antibodies - insert '-fab1' or '-fab2' if you are running an antibody.

7) Njobs - The number of jobs to queue for condor.  I like 100 for most runs and 10 for test runs.  Generally, you want to keep the cluster full to maximize your efficiency.

8) Make sure to change 'compiler' if you are using a new executable

A final note:  Do not specify more than one value for each variable in the condor config file (the last value to be assigned will be used).  I have provided several examples of each variable; you can turn one of these examples on by commenting out the default value and uncommenting the example value.  You can also insert your own value for a variable; just make sure you comment out the default value.

An example - perturbation run

Copy the samplerun directory from examples over to your cluster.  

If you have not already prepacked test.pdb in samplerun, do so now.  The large scale .bash scripts assume a prepacked pdb already exists.

The following should be done on jazz-mgmt (or your cluster), in samplerun:

Copy test.config from $rosetta_scripts/condor_scripts and rename it pert.config.

cp $rosetta_scripts/condor_scripts/test.config pert.config

Open up 'pert.config' and change nstruct to 1000 (deactivate the default option and activate the "perturbation run" option).  Also change Njobs to 100.  You could leave it at 10, but that would take forever to run.

That's it.  Now launch the run on condor using crun.bash (located in $rosetta_scripts):

crun.bash test pert

Of course, if you are using a pdb other than the 'test.pdb' provided, then change 'test' to the name of your pdb.

crun.bash requires a copy of the condor config file in your local directory to run.

Run crun.bash without arguments to get the usage message.  The arguments are <pdb> and <config>, where <config> is the name of the config file without the '.config' extention.  The extension must be '.config' for the script to work properly.

crun.bash will create two files:  an executable 'test.pert.bash' and a condor script 'test.pert.con'.  It will then directly launch the condor script.  You should get a condor message that 100 jobs have been submitted.

If you want to see your jobs in the condor queue, type:

condor_q

You will get a list of jobs including the cluster number (first column), the owner of the jobs, and whether they are running or idle.

If you want to stop the run, use condor_rm:

condor_rm <your user name>  or
condor_rm <job cluster number>

You can get the cluster number by looking at the first column in condor_q.  The second method of removal is helpful if you only want to destroy one set of jobs.  The first command will remove all jobs owned by you.

Other cases

In the refinement tutorial and the blind runs tutorial, I discuss how to run other types of large scale runs in a similar way.

More than 10,000 structures

Earlier I mentioned that it is not possible to have nstruct > 9999.  That is because Rosetta will not put decoy numbers greater than 9999 in the decoy filenames, so if you run more than 10000, structure 10,001 will come out as '0001' and will overwrite structure 0001.

There is a way to get around this and make more than 10K structures.  You may want to do this in a calibration run trying to find a low-scoring outlier in a blind prediction.

Several options are provided in the condor config file to do this.  First, under 'PREFIX', comment out the default option and activate the prefix assignment under "More than 10K structures."  'prefix=$1' causes the condor job number to be passed in as the prefix rather than setting prefix explicitly.  Rosetta then converts this to a directory name like 'aa', 'ab', 'ac', ...

Because of this, nstruct now means the number of decoys per job instead of the number of decoys total.  As a result, the number of structures you produce is Njobs * nstruct.  If you want 20,000 structures, leave Njobs at 100 and set Nstruct to 200. 

To extract the structures into one directory afterward and to combine them into one scorefile, type (in the samplerun directory):

pp_extract_set.sh <pdb> <topN>

where pdb is your pdb name and topN says 'extract the top N structures.'  This will also create a directory scorefiles with a combined scorefile from all 100 directories.  pp_extract_set.sh is a rosetta script, and pp stands for 'post processing'.

All this is quite annoying, so it is usually best to start with fewer than 10K structures.

Back to index