Rosetta 3.5
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Patch.cc
Go to the documentation of this file.
1 // -*- mode:c++;tab-width:2;indent-tabs-mode:t;show-trailing-whitespace:t;rm-trailing-spaces:t -*-
2 // vi: set ts=2 noet:
3 //
4 // (c) Copyright Rosetta Commons Member Institutions.
5 // (c) This file is part of the Rosetta software suite and is made available under license.
6 // (c) The Rosetta software is developed by the contributing members of the Rosetta Commons.
7 // (c) For more information, see http://www.rosettacommons.org. Questions about this can be
8 // (c) addressed to University of Washington UW TechTransfer, email: license@u.washington.edu.
9 //////////////////////////////////////////////////////////////////////
10 /// @begin Patch.cc
11 ///
12 /// @brief
13 /// implementation class for abstract class Residue
14 ///
15 /// @details
16 /// My understanding of this piece of code comes from various conversations with
17 /// Oliver Lang, Summer Thyme, and Andrew Leaver-Fay. If the ideas are incorrect, feel
18 /// free to correct, but I doubt you will, because no one ever comments code.
19 ///
20 /// General Overview
21 ///
22 /// What are patches? Patches are modifications to the original amino acid residues. These
23 /// modifications include capping the N and C terminus, adding phosphorylation to residues,
24 /// etc, etc. The actual files that are read and used are found in:
25 /// rosetta_database/chemical/residue_type_sets/fa_standard/patches
26 /// All of these patches are read at the beginning of a Rosetta application and are applied to all
27 /// residues that are identified as needing to be patched (identification of
28 /// residues is explained later). Identification of residues that need to be
29 /// patched varies by what is in the actual patch file. For example, residues that are C and N terminus
30 /// are patched with the N and C terminus patch, which is defined in the files NtermProteinFull.txt and
31 /// CtermProteinFull.txt.
32 ///
33 /// Overview of a Patch File
34 ///
35 /// Lets look at a patch file to see what goes on. File we are looking at is NtermProteinFull.txt. The file is shortened
36 /// to make it easier to read. Important parts are shown:
37 ///
38 /// NAME NtermProteinFull #actual name of the patch
39 /// TYPES LOWER_TERMINUS #the type that this patch belongs to. Types are defined in VariantType.cc/.hh
40 ///
41 /// #This section is for general selection rules to apply the patch.
42 /// BEGIN_SELECTOR #Here is where we define how to select amino acids for the patch. Properties/variant types needed for the patch are found between the BEGIN_SELECTOR and END_SELECTOR "titles"
43 /// PROPERTY PROTEIN #for this patch to apply, the residue must be a protein, as defined in the paramaters file
44 /// NOT VARIANT_TYPE LOWER_TERMINUS #We do not want to patch the variant type LOWER_TERMINUS. This is because, we do not want to double patch a residue (if the residue is already variant type lower_terminus)
45 /// END_SELECTOR #ending the selection process for the patch
46 ///
47 /// #This section is to modify specific residues that the patch encounters
48 /// BEGIN_CASE ## PROLINE # Within the BEGIN_CASE and END_CASE section is where the residue is modified. These are the operations that occur to apply the patch
49 /// BEGIN_SELECTOR #Once again, we are defining what to do in this specific BEGIN_CASE/END_CASE block by using the selctor
50 /// AA PRO #We only want to modify aa PRO in the following way between the BEGIN_CASE and END_CASE block
51 /// END_SELECTOR #end selection requirements
52 ///
53 /// ADD_ATOM 1H Hpol HC 0.24 #straight forward, add atoms 1H
54 /// ADD_ATOM 2H Hpol HC 0.24
55 /// ADD_BOND N 1H #atoms need bonds, add those bonds
56 /// ADD_BOND N 2H
57 /// SET_POLYMER_CONNECT LOWER NONE # setting a property
58 ///
59 /// ## totally making these up:
60 /// SET_ICOOR 1H 120 60 1 N CA C ## like to use CD but CD's parent is CG #Once new atoms are placed, new Icoor need to be made
61 /// SET_ICOOR 2H 120 60 1 N CA 1H
62 ///
63 /// ## modify properties of existing atoms
64 /// SET_ATOM_TYPE N Nlys
65 /// SET_MM_ATOM_TYPE N NP
66 /// SET_ATOMIC_CHARGE N -0.07
67 /// SET_ATOMIC_CHARGE CA 0.16
68 /// SET_ATOMIC_CHARGE HA 0.09
69 /// SET_ATOMIC_CHARGE CD 0.16
70 /// SET_ATOMIC_CHARGE 1HD 0.09
71 /// SET_ATOMIC_CHARGE 2HD 0.09
72 /// ADD_PROPERTY LOWER_TERMINUS ## implies terminus
73 /// END_CASE
74 ///
75 /// BEGIN_CASE ### THE GENERAL CASE # Here is the general case, that is specified at the very begining of the patch file
76 ///
77 /// ## these are the operations involved
78 /// DELETE_ATOM H ## deletes all bonds to this atom
79 /// ADD_ATOM 1H Hpol HC 0.33
80 /// ADD_ATOM 2H Hpol HC 0.33
81 /// ADD_ATOM 3H Hpol HC 0.33
82 /// ADD_BOND N 1H
83 /// ADD_BOND N 2H
84 /// ADD_BOND N 3H
85 /// SET_POLYMER_CONNECT LOWER NONE
86 ///
87 /// ## totally making these up:
88 /// SET_ICOOR 1H 120 60 1 N CA C
89 /// SET_ICOOR 2H 120 60 1 N CA 1H
90 /// SET_ICOOR 3H 120 60 1 N CA 2H
91 ///
92 /// ## modify properties of existing atoms
93 /// SET_ATOM_TYPE N Nlys
94 /// SET_MM_ATOM_TYPE N NH3
95 /// SET_ATOMIC_CHARGE N -0.3
96 /// SET_ATOMIC_CHARGE CA 0.21
97 /// SET_ATOMIC_CHARGE HA 0.10
98 /// ADD_PROPERTY LOWER_TERMINUS ## implies terminus
99 /// END_CASE
100 ///
101 ///
102 /// So, lets go through what was put up there. In general, you name your patch and assign it a type. Then,
103 /// you tell the patch system in the BEGIN_SELECTOR / END_SELECTOR block what type of properties that you
104 /// are looking for to apply the patch. You then in the BEGIN_CASE / END_CASE block add operations to the residue.
105 /// You can specify specific selectors within the BEGIN_CASE / END_CASE by having another BEGIN_SELECTOR / END_SELECTOR block.
106 /// People generally specify specific amino acids within the BEGIN_SELECTOR / END_SELECTOR blocks within the BEGIN_CASE / END_CASE blocks.
107 /// At the end, there is one last BEGIN_CASE / END_CASE with no BEGIN_SELECTOR / END_SELECTOR block. This block is used to specify
108 /// what to do with the general case, which was defined at the very beginning of the file. A little more detail follows below
109 ///
110 /// First, you must name your patch. Then you have to give it a "type". This type is defined in VariantType.cc/.hh.
111 /// All the type does is adds a name that can be used later on in different protocols.
112 /// It also helps the patch system keep track of what residues are patched with what type.
113 /// The type has no magical meaning, it's a name that you give it that is "defined" in the variantType.cc. In short,
114 /// it's a name handler. Once the name and type have been assigned, you must define a general case in which your patch applies.
115 /// Generally, this means that you want it to apply to a specific type of aa, or all residues of a specific type or property.
116 /// After this block comes the BEGIN_CASE / END_CASE statements. In these statements, you specify what modifications you want
117 /// to apply to the residues. If you have different requirements for different amino acids, you must specify this in a BEGIN_SELECTOR /
118 /// END_SELECTOR block within the BEGIN_CASE / END_CASE block. You must specify specific amino acids first, then in a new BEGIN_CASE / END_CASE
119 /// block, you specify the general modifications defined by the first selector. Confusing, huh? The reason for having specific selectors defined
120 /// before the general case is because...I don't know. I suspect its because of how the file is read in and applied. Regardless, if you want
121 /// to do something to a specific amino acid/ residue before you get to the general case, you need to have a BEGIN_CASE / END_CASE block with
122 /// a selector in there before you get to the general BEGIN_CASE / END_CASE block.
123 ///
124 /// Using Patch Selector to Specify Application of a patch to specific residues
125 ///
126 /// Woah there! Now that you know how to use patches, you don't need to go all gun-ho crazy and create a ton of patches. This is because
127 /// all patches are read at the beginning of Rosetta and applied. This creates overhead if you have a ton of patch files. To circumvent this,
128 /// Someone wrote an option called -residues:patch_selectors. I don't know who wrote it, or where its located, sorry. This option allows
129 /// you to create patches that are only loaded when you use the option -residues:patch_selectors. To get this to work, you create a patch
130 /// like normal. Then, in the BEGIN_SELECTOR / END_SELECTOR line, you add a line CMDLINE_SELECTOR <name of command line>. This means, if you
131 /// use the CMDLINE_SELECTOR sc_orbitals, your patch will only be loaded if you use the option -residues:patch_selectors sc_orbitals. Cool, huh?
132 ///
133 ///
134 /// Trouble Shooting!!!
135 /// so, you added a new patch and used cmd line selector and nothing happens. Did you modify VariantTypes.cc to have your new type? Did you
136 /// name things correctly? Did you edit the patches.txt to include your patch? Are you trying to use a command that does not exist? Look at patchoperations.cc for commands
137 /// that exist. These are all things to check for.
138 ///
139 /// so, you have added a new patch and you get segmentation faults. Most common seg fault for me is in the stub atom. This is because
140 /// your patch might add atoms/delete atoms that are used in other patches. In order to fix this, you must modify other patches to
141 /// include your modifications. Remember patches are all loaded at the beginning of Rosetta startup! If you have conflicts with other patches
142 /// in your patch, even if you are using cmd line selector, you must resolve those problems. Write an integration test so that people wont
143 /// screw up your patches.
144 ///
145 ///
146 /// @author
147 /// Phil Bradely
148 /// Steven Combs only added comments
149 ///
150 /// @last_modified October 22 2010
151 /////////////////////////////////////////////////////////////////////////
152 
153 
154 // Unit headers
155 #include <core/chemical/Patch.hh>
156 
157 // Package Headers
158 // Commented by inclean daemon #include <core/chemical/PatchOperation.hh>
159 
160 
161 // ObjexxFCL headers
162 
163 // Numeric headers
164 
165 
166 // Utility headers
167 
168 
169 // C++ headers
170 // Commented by inclean daemon #include <sstream>
171 #include <fstream>
172 
173 #include <utility/vector1.hh>
174 #include <utility/io/izstream.hh>
175 
176 namespace core {
177 namespace chemical {
178 
179 /// @details Auto-generated virtual destructor
181 
182 /// @details Auto-generated virtual destructor
184 
185 static basic::Tracer tr("core.chemical");
186 
187 /// @brief the string used to generate new residue names
188 std::string const patch_linker( "_p:" );
189 
192 {
193  return rsd_type.name().substr( 0, rsd_type.name().find( patch_linker ) );
194 }
195 
198 {
199  Size spos = rsd_type.name().find( patch_linker );
200  if( spos < rsd_type.name().length() ) return rsd_type.name().substr( spos );
201  else return "";
202 }
203 
204 
205 /// @brief handy function, return the first word from a line
207 tag_from_line( std::string const & line )
208 {
209  std::string tag;
210  std::istringstream l( line );
211  l >> tag;
212  if ( l.fail() ) return "";
213  else return tag;
214 }
215 
216 /// @brief create a PatchCase from input lines
217 /// @details add selector_ from lines enclosed by "BEGIN_SELECTOR" and "END_SELECTOR".\n
218 /// add operations_ from each input line containing a single operation
221  utility::vector1< std::string > const & lines
222 )
223 {
224  PatchCaseOP pcase( new PatchCase() );
225 
226  bool in_selector( false );
227  for ( uint i=1; i<= lines.size(); ++i ) {
228  std::string const tag( tag_from_line( lines[i] ) );
229 
230  if ( tag == "BEGIN_SELECTOR" ) {
231  assert( !in_selector );
232  in_selector = true;
233  } else if ( tag == "END_SELECTOR" ) {
234  in_selector = false;
235  } else if ( in_selector ) {
236  pcase->selector().add_line( lines[i] );
237  } else {
239  if ( operation ) pcase->add_operation( operation );
240  }
241  }
242 
243  return pcase;
244 }
245 
246 /// @details First clone the base ResidueType. Then patching for this case is done by applying all the operations. Finally
247 /// call finalize() to update all primary and derived data for the new ResidueType
249 PatchCase::apply( ResidueType const & rsd_in ) const
250 {
251  ResidueTypeOP rsd( rsd_in.clone() );
252 
254  iter_end= operations_.end(); iter != iter_end; ++iter ) {
255  bool const fail( (*iter)->apply( *rsd ) );
256  if ( fail ) return 0;
257  }
258  rsd->finalize();
259  return rsd;
260 }
261 
262 /// @details - first read in all lines from the file, discarding # comment lines
263 /// - parse input lines for Patch name and variant types (NAME, TYPES)
264 /// - parse input lines for general ResidueSelector defined for this Patch (BEGIN_SELECTOR, END_SELECTOR)
265 /// - parse input lines to create each case accordingly (BEGIN_CASE, END_CASE)
266 /// @note keep the order to avoid triggering parsing errors
267 void
269 {
270  // clear old data
271  tr.Debug << "Reading patch file: " << filename << std::endl;
272 
273  name_ = "";
274  types_.clear();
275  selector_.clear();
276  cases_.clear();
277  replaces_residue_type_ = false;
278 
280  { // read the lines file
281  utility::io::izstream data( filename.c_str() );
282  if( !data.good() ){
283  utility_exit_with_message("Cannot find patch file: "+filename);
284  }
285  std::string line;
286  while ( getline( data,line ) ) {
287  std::string const tag( tag_from_line( line ) );
288  if ( tag.size() && tag[0] != '#' ) lines.push_back( line );
289  }
290  }
291 
292  // misc parsing
293  for ( uint i=1; i<= lines.size(); ++i ) {
294  std::istringstream l(lines[i]);
295  std::string tag;
296  l >> tag;
297  if ( tag == "NAME" ) {
298  l >> name_;
299  } else if ( tag == "TYPES" ) {
300  VariantType t;
301  l >> t;
302  while ( !l.fail() ) {
303  types_.push_back( t );
304  l >> t;
305  }
306  } else if ( tag == "REPLACE_RES_TYPE" ) {
307  replaces_residue_type_ = true;
308  }
309  }
310 
311  // build the residue selector
312  {
313  bool in_selector( false );
314  for ( uint i=1; i<= lines.size(); ++i ) {
315  std::string tag( tag_from_line( lines[i] ) );
316  if ( tag == "BEGIN_CASE" ) {
317  break;
318  } else if ( tag == "BEGIN_SELECTOR" ) {
319  assert( !in_selector );
320  in_selector = true;
321  } else if ( tag == "END_SELECTOR" ) {
322  assert( in_selector );
323  in_selector = false;
324  } else if ( in_selector ) {
325  selector_.add_line( lines[i] );
326  }
327  }
328  }
329 
330  // get the cases
332  bool in_case( false );
333  while ( !lines.empty() ) {
334  // look for a case
335  std::string tag( tag_from_line( lines[1] ) );
336  if ( tag == "BEGIN_CASE" ) {
337  assert( case_lines.empty() );
338  assert( !in_case );
339  in_case = true;
340  } else if ( tag == "END_CASE" ) {
341  PatchCaseOP new_case( case_from_lines( case_lines ) );
342  if ( new_case ) cases_.push_back( new_case );
343  case_lines.clear();
344  in_case = false;
345  } else if ( in_case ) case_lines.push_back( lines[1] );
346 
347  lines.erase( lines.begin() );
348  }
349 }
350 
351 /// @details loop through the cases in this patch and if it is applicable to this ResidueType, the corresponding patch
352 /// operations are applied to create a new variant type of the basic ResidueType. The new types's name and its
353 /// variant type info are updated together with all other primary and derived ResidueType data.
355 Patch::apply( ResidueType const & rsd_type ) const
356 {
357  if ( !applies_to( rsd_type ) ) return 0; // I don't know how to patch this residue
358 
359  using namespace basic;
360  static basic::Tracer core_chemical("core.chemical");
361 
363  iter_end = cases_.end(); iter != iter_end; ++iter ) {
364 
365  if ( (*iter)->applies_to( rsd_type ) ) {
366  // this patch case applies to this rsd_type
367  ResidueTypeOP patched_rsd_type( (*iter)->apply( rsd_type ) );
368  if ( patched_rsd_type ) {
369  // patch succeeded!
370  // std::cout << "name of patched residue " << patched_rsd_type->name() << std::endl;
371  if ( !replaces_residue_type_ ) {
373  iter_end = types_.end(); iter != iter_end; ++iter ) {
374  patched_rsd_type->add_variant_type( *iter );
375  }
376  std::string name_new = patched_rsd_type->name() + patch_linker + name_;
377  patched_rsd_type->name( name_new );
378  }
379  tr.Debug << "successfully patched: " << rsd_type.name() << " to: " << patched_rsd_type->name() << std::endl;
380  return patched_rsd_type;
381  }
382  }
383  }
384  return 0;
385 }
386 
387 
388 } // chemical
389 } // core