Epi-Graft Match User's Guide
Yih-En Andrew Ban
Last Revision: 20080126
----------------------------

Notes: this is still a rough draft, please continue to add (or rewrite!) as
necessary.  It might be a good idea to switch to a document in pdf.  It's
hard to format well in a plain text file.

< Matching Pipeline >

 '*' indicates on by default
 '-' indicates off by default

[ Operations on Primary Matches ]
  * Find all primary matches or load existing matches.
  - Apply spatial filter.
  - Optimize by rigid body move.
  - Optimize by fluidizing takeoff.
  - Optimize by fluidizing landing.
  - Apply strict rms filter if optimization attempted.

[ Operations on Secondary Matches ]
  * Use matches fed in from above to rough match.
  - Optimize by fluidizing landing.
  - Apply strict rms filter if optimization attempted.


< Notes on Input/Output >
  Input/output directories are based on the Rosetta paths.txt file:
    - input scaffolds to match use the input 'starting structure' entry
    - output match results use the output 'score' entry
    - output pdbs (aligned loops/predesign) use the output 'pdb path' entry
  Note that the directory structure of the output pdb directory needs to match
  the directory structure of the input pdb.


< Switches >

  [ Main modes ]
               '-epi_graft -help'  :  obtain all flags and defaults for match mode.
              '-epi_graft -match'  :  invoke match mode.

  [ Sub-modes ]
                   '-rough_match'  :  perform subsequent rough match.
                   '-combi_match'  :  perform subsequent combi match.
'-superposition_minrepack_refine'  :  perform minrepack refinement (no rigid body movement) for superposition
                                      protocol designs, only operates on a list of existing matches
                                      (i.e. -use_input_from_match)
             'screen_with_repack'  :  perform screening using repack, ony operates on a list of existing
                                      matches (i.e. -use_input_from_match)

  [ required options ]
                '-nres_Ab <int> '  :  number of residues in the antibody, if 0 indicates lack of antibody
                                      in native_complex structure

  [ Additional alignment systems for primary loop matches ]
                '-skip_N2C_align'  :  skip N2C alignment trial
                '-skip_C2N_align'  :  skip C2N trial
                       '-E_align'  :  include endpoint alignment trial
                       '-S_align'  :  include superposition alignment trial
                      '-SS_align'  :  include specific superposition alignment trial (superposition of
                                      specific residues within a given range)

  [ Input ]
       '-native_complex <string>'  :  filename of native antibody-antigen complex (default 'native_complex.pdb'),
          '-loop_ranges <string>'  :  filename of epitope loop ranges (default 'loop_ranges.txt')
           '-input_file <string>'  :  filename of input file (default 'scaffold_list.txt'); if filename ends
                                      in '.gz', then the file is assumed to be compressed by gzip
          '-use_input_from_match'  :  indicates input is from a prior match run
              '-input_pdb_has_Ab'  :  input pdbs contain Ab, so rigid body orientation is taken from Ab;
                                      only takes into affect if '-use_input_from_match'
  	                                  (changes '-input_file' default to 'graft_matches.in')
                      '-Ab_first'  :  indicates antibody is first when scaffold files contain Ab
                '-scaffold_first'  :  indicates antigen/scaffold is first when scaffold files contain Ab

  [ Ranges ]
    '-termini_residue_skip <int>'  :  skip 'n' residues from n-/c- terminus (default 4)
         '-min_match_width <int>'  :  minimum number of contiguous residues in a match (default 3)
   '-max_match_width_delta <int>'  :  allowed difference between number of residues in loop subrange
                                      and number of residues in scaffold gap range (default 10)
'-max_rough_match_width_delta <int>': allowed difference between number of residues in loop subrange
                                      and number of residues in scaffold gap range for secondary matches
				      ie rough matches (default: set to max_match_width_delta)
'-moveable_closure_residues <int>' :  number of _scaffold_ residues adjacent to a break that are allowed to move
                                      during loop closure (default 3 ) -- note that this number has an impact
                                      on multiple matches, as the minimum number of scaffold residues between two
                                      matches must be 'moveable_closure_residues * 2 + 1'

  [ Filters ]
 '-match_distance_epsilon <real>'  :  DEPRECATED. epsilon added to CA-CA distance check during primary match.
                                      The two CA's being checked as a potential match site should be
                                      the same distance apart as the end-to-end CA distance of epitope
                                      loop, +/- a buffer of epsilon Angstroms, eg if your epitope loop
                                      has an end-to-end distance of 10A, then any CA pairs in the scaffold
                                      that are between 7-13A will be considered, shorter or longer will be
                                      discarded.  match (default 3.0)
        '-max_closure_rms <real>'  :  maximum closure rms (default 2.0)
   '-max_rms_over_length <real>'   :  maximum rms over length (default -1.0, which implies OFF)
                                      this flag will change the 'S' and 'SS' alignment systems to filter by
                                      rms / length *instead* of rms, all other alignment systems will remain
                                      the same, so remember to set proper '-max_closure_rms' if other alignment
                                      systems are in use
        '-max_intra_clash <real>'  :  maximum clash allowed between loop and gapped scaffold (default 10000.0)
        '-max_inter_clash <real>'  :  maximum clash allowed between antibody and gapped scaffold (default 30000.0)
'-use_full_sidechain_inter_clash'  :  if this flag is not specified, by default an all-ala Ab will be used
                                      during the inter clash check
'-rough_match_closure_rms <real>'  :  maximum closure rms for rough match (default 2.0)
'-rough_match_ca_distance <real>'  :  maximum allowed distance between CA of endpoint for an epitope secondary
                                      component versus the CA of endpoint for a potential secondary match on
                                      the scaffold during rough match distance (default = rough match closure rms)
'-combi_match_ca_distance <real>'  :  maximum allowed CA-CA distance during combi match (default 1.0)

  [ Conformation ]
              '-fluidize_takeoff'  :  move two dihedrals (phi/psi) at +/-1 residue to takeoff for N2C and C2N matches
              '-fluidize_landing'  :  move four dihedrals (phi/psi/phi/psi) at +/-1 and +/-2 residue for broken
                                      endpoints (landing) of all matches
                       '-rb_move'  :  attempt rigid body movement during match
   '-recovery_rms_epsilon <real>'  :  epsilon added to '-max_closure_rms' during primary and rough match trials
                                      before fluidization and rb_move is attempted; fluidization and rb_move
                                      then attempts to recover/improve those matches to and rms <= '-max_closure_rms'
				      (default 0.0)
'-allowed_intra_clash_increase <real>'  :  allowed intra clash increase during fluidization and rb-move trials
                                           (default 10.0)
     '-dihedral_deviation <real>'  :  move dihedrals by +/- 'n' degrees during fluidization (default 5.0)
          '-dihedral_step <real>'  :  move in steps of 'n' degrees during fluidization (default 5.0)
    '-rb_cube_side_length <real>'  :  length of side of translation cube in angstroms during rb-move (default 1)
    '-rb_translation_step <real>'  :  translate in steps of 'n' angstroms during rb-move (default 0.5)
     '-rb_angle_deviation <real>'  :  rotate +/- 'n' degrees for each axis around defined coordinate frame
                                      during rb-move (default 5.0)
          '-rb_angle_step <real>'  :  rotate in steps of 'n' degrees during rb-move (default 5.0)

  [ Additional Filters ]
              '-spatial_filter <string>'  :  use pdb filename for spatial filter (check number of cbeta
                                             on scaffold within distance of heavy atoms on spatial filter pdb)
'-spatial_filter_distance_cutoff <real>'  :  distance cutoff against external structure for spatial filter
                                             (default 5.5)
       '-spatial_filter_min_cbeta <int>'  :  minimum number of C-beta on (all-ala) scaffold within the
                                             distance cutoff of the spatial filter pdb structure (default 6)
       '-spatial_filter_max_cbeta <int>'  :  maximum number of C-beta on (all-ala) scaffold within the
                                             distance cutoff of the spatial filter pdb structure (default 10)

  [ Additional Statistics ]
              '-compute_cbeta_neighbors'  :  compute number of cbeta neighbors on scaffold within the
                                             specified distance of any heavy atoms per each epitope component
      '-cbeta_neighbors_distance_cutoff'  :  distance cutoff for cbeta on scaffold versus heavy atom on
                                             epitope component (default 5.5)

  [ Output ]
                '-no_single_match_output'  :  skip single match output, only rough match output
		'-no_partial_match_output' :  skip partial matches, only output matches to all loops
                  '-output_file <string>'  :  filename for output table (default 'graft_matches.out');
                                              if filename ends in '.gz', then output file will be written
                                              using gzip compression
                  '-output_aligned_loops'  :  write pdb of aligned loops; alignment reflects rigid body
                                              transforms from takeoff optimization and rigid body move
            '-output_predesign_structure'  :  write pdb of pre-design structure
    '-output_predesign_structure_with_Ab'  :  write pdb of pre-design structure with aligned antibody
      '-override_pdb_output_path <string>' : overrides the standard rosetta pdb output path in paths.txt

  [ Checkpointing ]
           '-checkpoint <string>'  :  turn on checkpointing, and specifies filename for checkpointing


< File Formats >

  [ native antibody-antigen complex pdb ]
   PDB file, Ab must be first!  Residues in antigen/epitope section must be
   in proper numerical order, e.g. 34, 35, 100, 101 .. to N.  Residue numbers
   themselves don't have to run from 1 to N.  All residues specified in each
   of the 'full ranges' in the loop ranges file must be in the antigen/epitope
   section.

  [ loop ranges file ]
   example:

   loop: 1
   full_range: 362 376
   nranges: 3
   range: 362 376
   range: 364 371
   range: 367 372

   loop: 2
   full_range: 277 284
   nranges: 2
   range: 279 284
   range: 278 281

  [ loop ranges optional tag ]
  can use "disallow_primary" to prevent one or more loops from being
  used as a primary match in the find_singleton ...
  Ex:

   loop: 2	disallow_primary
   full_range: 277 284
   nranges: 2
   range: 279 284
   range: 278 281


  [ loop ranges file if using SS_align ]

   file must contain a line of the following form indicating the native
   epitope residues to be used in SS_align superposition:

   superposition_residues: 362 363 366 370

  [ input file if scaffold list ]
   one pdb per line

  [ minimal input file from match run ]
   '#' at beginning of line specifies comment line and is skipped
   columns/example:

  # filename   loop_id   alignment_system   gap_begin   gap_end   native_loop_begin   native_loop_end
    2ny7.pdb         1              N2C_N         168       182                 362               376

 - 'alignment_system' is one of:  N2C_N, N2C_CA, N2C_C, C2N_N, C2N_CA, C2N_C, S, E
 - 'loop_id' should be internally consistent to the file, but regardless the program will figure out
    and correct the loop id dependent upon the given loop ranges file
 - 'gap_begin/end' are residues on the scaffold
 - 'native_loop_begin/end' specifies the residues on the epitope in native numbering


< Additional notes >

When screening with repack or minimize, remember to use '-ex1 -ex1aro -ex2 -extrachi_cutoff 0'
for additional rotamers.


< Output Columns >
order of columns in graft matches output file:
#######filename     loop_id     align_sys     gap_begin     gap_end     native_loop_begin     native_loop_end                    D_1                    D_2                    D_3                    D_4                    D_5                    D_6                    D_7                    D_8              R11              R21              R31              R12              R22              R32              R13              R23              R33               T_x               T_y               T_z     overall_rms     n_terminal_rms     c_terminal_rms     rms_over_length     intra_clash     inter_clash       n_angle       c_angle     cbeta_neighbors     output_pdb_prefix


< Some Common Tasks >

 Find primary loops with max_closure_rms = 1.0:

   -epi_graft -match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -input_file SCAFFOLD_LIST.TXT -paths paths.txt -max_closure_rms 1.0

 Find primary loops with max_closure_rms = 1.0 and then rough match with CA-CA distance of 1.5:

   -epi_graft -match -rough_match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -input_file SCAFFOLD_LIST.TXT -paths paths.txt -max_closure_rms 1.0 -rough_match_ca_distance 1.5

 Find primary loops with max_closure_rms = 1.0 and then rough match with CA-CA distance of 1.5 and attempt fluidization for both takeoff and landing:

   -epi_graft -match -rough_match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -input_file SCAFFOLD_LIST.TXT -paths paths.txt -max_closure_rms 1.0 -rough_match_ca_distance 1.5 -recovery_rms_epsilon 0.5 -fluidize_takeoff -fluidize_landing

 Find primary loops with max_closure_rms = 1.0 and then rough match with CA-CA distance of 1.5 and attempt rb moves:

   -epi_graft -match -rough_match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -input_file SCAFFOLD_LIST.TXT -paths paths.txt -max_closure_rms 1.0 -rough_match_ca_distance 1.5 -recovery_rms_epsilon 0.5 -rb_move

 Use matches from prior run and find rough match with CA-CA distance of 1.5:

   -epi_graft -match -rough_match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -use_input_from_match -input_file PRIOR_MATCHES.TXT -paths paths.txt -rough_match_ca_distance 1.5

 Use matches from prior run and fluidize takeoff and landing:

   -epi_graft -match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -use_input_from_match -input_file PRIOR_MATCHES.TXT -paths paths.txt -recovery_rms_epsilon 0.5 -fluidize_takeoff -fluidize_landing

 Run everything from the beginning:

   -epi_graft -match -rough_match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -input_file SCAFFOLD_LIST.TXT -paths paths.txt -max_closure_rms 1.0 -rough_match_ca_distance 1.5 -recovery_rms_epsilon 0.5 -fluidize_takeoff -fluidize_landing -rb_move

 Run everything from prior matches:

   -epi_graft -match -rough_match -native_complex NATIVE_COMPLEX.PDB -nres_Ab 205 -loop_ranges LOOP_RANGES.TXT -use_input_from_match -input_file PRIOR_MATCHES.TXT -paths paths.txt -max_closure_rms 1.0 -rough_match_ca_distance 1.5 -recovery_rms_epsilon 0.5 -fluidize_takeoff -fluidize_landing -rb_move


< Terminology >

Notes: this needs to be standardized both here and in the code.

 loop subrange - the first and last residues of the match on the loop of the epitope
 scaffold gap range - the first and last residues of the match on the scaffold
 match range - same as scaffold gap range

 strict closure RMS - number specified by "-max_closure_rms"

 primary/singleton match - first loop that defines the rigid body orientation
 secondary match - additional loop
 spatial filter - pdb that defines additional distance/bump constraint on the match
 takeoff fluidization - move two dihedrals (phi/psi) at +/-1 to takeoff residue for N2C and C2N
 landing fluidization - move four dihedrals (phi/psi/phi/psi) at +/-1 and +/-2 residue for broken
                        endpoints (landing) for all matches

 S/Superposition alignment - superposition alignment over all residues from the loop subrange and scaffold gap range
 E/Endpoint alignment - superposition alignment on just the two endpoint residues from the loop subrange and scaffold gap range

