Rosetta 3.5
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Matcher.hh
Go to the documentation of this file.
1 // -*- mode:c++;tab-width:2;indent-tabs-mode:t;show-trailing-whitespace:t;rm-trailing-spaces:t -*-
2 // vi: set ts=2 noet:
3 // :noTabs=false:tabSize=4:indentSize=4:
4 //
5 // (c) Copyright Rosetta Commons Member Institutions.
6 // (c) This file is part of the Rosetta software suite and is made available under license.
7 // (c) The Rosetta software is developed by the contributing members of the Rosetta Commons.
8 // (c) For more information, see http://www.rosettacommons.org. Questions about this can be
9 // (c) addressed to University of Washington UW TechTransfer, email: license@u.washington.edu.
10 
11 /// @file protocols/match/Mather.hh
12 /// @brief
13 /// @author Alex Zanghellini (zanghell@u.washington.edu)
14 /// @author Andrew Leaver-Fay (aleaverfay@gmail.com), porting to mini
15 
16 #ifndef INCLUDED_protocols_match_Matcher_hh
17 #define INCLUDED_protocols_match_Matcher_hh
18 
19 // Unit headers
21 
22 // Package headers
28 
33 
37 
39 
40 // Project headers
43 
44 #include <core/types.hh>
46 #include <core/id/AtomID.fwd.hh>
47 #include <core/pose/Pose.fwd.hh>
48 
49 // Utility headers
50 #include <utility/pointer/ReferenceCount.hh>
51 // AUTO-REMOVED #include <utility/fixedsizearray1.hh>
52 #include <utility/LexicographicalIterator.fwd.hh>
53 // AUTO-REMOVED #include <utility/vector1.hh>
54 #include <utility/vector1_bool.hh>
55 
56 // Numeric headers
57 #include <numeric/geometry/BoundingBox.hh>
58 #include <numeric/xyzVector.hh>
59 
60 // C++ headers
61 #include <list>
62 #include <map>
63 
64 #include <utility/vector1.hh>
65 
66 
67 //auto headers
68 #ifdef WIN32
69 #include <core/id/AtomID.hh>
78 #endif
79 
80 
81 namespace protocols {
82 namespace match {
83 
84 /// Overview:
85 /// The matcher algorithm was originally concieved of within the domain of enzyme design.
86 /// The transition state for the desired reqction is contacted by several amino acids
87 /// each with a particular geometry. The goal of the matcher is to find a set of backbone
88 /// positions on a given protein-backbone scaffold where those amino acids could be grafted
89 /// such that they would contact the ligand in the desired geometry.
90 ///
91 /// Consider a case where the transition state is contacted by an asparagine, an aspartate
92 /// and a histadine. The user designing an enzyme for this transition state knows the geometry
93 /// that describes the orientation of the transition state with respect to each of these
94 /// side chains; what they do not know is into what protein and at what positions they should
95 /// introduce these amino acids. The user will give the matcher a description of the geometry
96 /// between the amind acids and the transition state. This geometry is in the form of 6 parameters:
97 /// 3 diherals, 2 angles, and 1 distance (more on these later). Given the coordinates
98 /// of a particular side chain and the geometry describing the transition state relative to
99 /// the side chain, the coordinates of the transition state may be computed. In a sense,
100 /// the transition state may be grown off of the end of a side chain in the desired geoemtry.
101 ///
102 /// (Usually, the user will specify many different possible values for each of the 6 parameters,
103 /// and the matcher will then consider all combinations of those values. E.g. the ideal
104 /// distance might be 2.0 A, but the user might ask the matcher to consider the values
105 /// 1.95 and 2.05 A additionally. Each assignment of values to these 6 parameters fully
106 /// specifies the coordinates of the transition state.)
107 ///
108 /// The matcher examines each geometric constraint one at a time. It builds rotamers
109 /// for one or more amino acids capable of satisfying a desired geometry (e.g. both ASP and GLU
110 /// if an acid group is needed) at each of several active-site positions, and for each rotamer, it
111 /// grows the transition state. The matcher does a quick collision check between the atoms of the
112 /// transition state and the backbone of the protein, rejecting transition-state conformations
113 /// that collide. If the conformation is collision-free, then the matcher measures the coordinates
114 /// of the transition state as a point in a 6-dimensional space. (It is no coincidence that
115 /// there are 6 geometric parameters and that there are 6 dimensions in the space describing the
116 /// transition state's coordinates). With this 6-dimensional coordinate, the matcher can recover
117 /// the coordinates for the transition state -- the 6-D coordinate and the full euclidean coordinates
118 /// of the transition state are interconvertable. The matcher can also bin the coordinate. If two
119 /// coordinates in 6-D are close, they will be assigned to the same bin. This is the fundamental insight of
120 /// the matching algorithm: the matcher will grow the transition state from different catalytic
121 /// residues, and when the 6-d coordinates from different catalytic residues are assigned to the
122 /// same bin, then the matcher has found a set of conformations of the transition state that are
123 /// compatible with more than one catalytic geometry.
124 ///
125 /// Each collision-free placement of the transition state is called a "hit". If there are N
126 /// geometric-constrains that the matcher is asked to satisfy, then a set of N hits, one per
127 /// constraint, that fall into the same bin are called a "match".
128 ///
129 /// In the general case, the Matcher builds hits for each of several geometric constraints. The
130 /// protein scaffold in the enzyme-design example generalizes to any macro-molecular polymer
131 /// scaffold. The protein rotamers in the enzyme-design example generalizes to a set of conformations
132 /// for the "upstream" partner. The transition state gene in the enzyme-design example generalizes to
133 /// a "downstream" partner, which itself may have multiple conformations. "Upstream" and "Downstream"
134 /// refer to the order in which the coordinates of the two partners are computed. The upstream
135 /// coordinates are built first, the downstream coordinates second. Changes to the coordinates of
136 /// the upstream partner propagate to the coordinates of the downstream partner.
137 /// In the enzyme-design example, the transition state is considered to be rigid; in the general case
138 /// the transition state may have multiple conformations. The downstream partner could also
139 /// be an entire protein -- and may have it's own set of rotameric states. E.G. one might want to
140 /// match a hydrogen-bond donor on the scaffold to a serine side-chain on the target (downstream) protein.
141 /// The downstream partner should then be able to examine many serine rotamers for each conformation of
142 /// the upstream rotamer.
143 ///
144 /// A hit is represented in two parts: a discrete part and a continuous part. The discrete portion consists
145 /// of four integers: 1. the build-point index on the scaffold, 2. the rotamer index on the upstream partner,
146 /// 3. the external-geometry index, and 4. the rotamer index on the downstream partner. The continuous portion
147 /// consists of 6 double-precision values representing the coordinate of the downstream partner in 6D.
148 /// The first three values are the x,y and z coordinates of a particular atom in the downstream partner.
149 /// The second three values are the phi, psi, and theta values describing the coordinate frame at this atom.
150 /// These three "Euler angle" parameters describe three rotations: Z(psi) * X(theta) * Z(phi) * I.
151 /// They are described in greater detail in src/numeric/HomogeneousTransform.hh.
152 /// "Phi" and "psi" here have nothing to do with the protein-backbone angles. When a hit is binned, there
153 /// are two sets of parameters that describe how wide the bins in each dimension should be: the Euclidean
154 /// bin widths are for the xyz coordinates, and the Euler bin widths are for the Euler angles. The
155 /// Euclidean bin widths are in Angstroms and the Euler bin widths are in degrees.
156 ///
157 /// A Matcher object should be initialized from a MatcherTask object through the
158 /// intialize_from_task() method. A MatcherTask will contain an EnzConstraintIO object, and the function
159 /// Matcher::initialize_from_file() will be invoked as the Matcher is intialied from a MatcherTask.
160 /// The documentation for Matcher::inialize_from_file() describes the format of extra data
161 /// that may be included in the enzyme-design constraint file. This data should live within a
162 /// ALGORITHM_INFO:: match ... ALGORITHM::END block inside a CST::BEGIN ... CST::END block in
163 /// the constraint file.
164 ///
165 /// find_hits() is the main worker function. After the matcher finishes find_hits(),
166 /// the matches can be read by a MatchProcessor in a call to process_matches.
168 public:
169  typedef core::Real Real;
170  typedef core::Size Size;
172  typedef numeric::geometry::BoundingBox< Vector > BoundingBox;
173  typedef std::list< Hit > HitList;
174  typedef std::list< Hit >::iterator HitListIterator;
175  typedef std::list< Hit >::const_iterator HitListConstIterator;
176 
177 public:
178  /// Construction and Destruction
179  Matcher();
180  virtual ~Matcher();
181 
182 
183 public:
184 
185  /// Setup
186  void set_upstream_pose( core::pose::Pose const & pose );
187  void set_downstream_pose(
188  core::pose::Pose const & pose,
189  utility::vector1< core::id::AtomID > orientation_atoms
190  );
191 
193 
195  Size cst_id,
196  utility::vector1< Size > const & resids
197  );
198 
199  void set_n_geometric_constraints( Size n_constraints );
200 
203  }
204 
205 
207  Size cst_id,
209  );
210 
212  Size cst_id,
214  Size chi,
215  upstream::SampleStrategyData const & strat
216  );
217 
218  void
220  Size cst_id,
222  core::Real fa_dun_cutoff
223  );
224 
226  Size cst_id,
228  utility::vector1< std::string > const & upstream_launch_atoms,
229  utility::vector1< core::id::AtomID > const & downstream_3atoms,
231  Size const exgeom_id,
232  bool enumerate_ligand_rotamers = false,
233  bool catalytic_bond = false
234  );
235 
237  Size geom_cst_id,
238  Size target_geom_cst_id,
239  core::chemical::ResidueTypeCOP candidate_restype,
240  core::chemical::ResidueTypeCOP target_restype,
241  utility::vector1< Size > const & candidate_atids,
242  utility::vector1< Size > const & target_atids,
244  std::string SecMatchStr,
246  );
247 
249  Size geom_cst_id,
250  core::chemical::ResidueTypeCOP candidate_restype,
251  core::chemical::ResidueTypeCOP downstream_restype,
252  utility::vector1< Size > const & candidate_atids,
253  utility::vector1< Size > const & target_atids,
255  std::string SecMatchStr,
257  bool catalytic_bond
258  );
259 
260  void set_occupied_space_bounding_box( BoundingBox const & bb );
261  void set_hash_euclidean_bin_width( Real width );
262  void set_hash_euler_bin_width( Real width );
263  void set_hash_euclidean_bin_widths( Vector widths );
264  void set_hash_euler_bin_widths( Vector widths );
265 
266  void set_bump_tolerance( Real permitted_overlap );
267 
268  /// @brief The primary way to initialize a Matcher is through a MatcherTask.
269  void
271  MatcherTask const & task
272  );
273 
274  /// @brief Intialize the geometric constraints from the EnzConstraionIO object.
277  MatcherTask const & task
278  );
279 
280 public:
281 
282  /// @brief Main worker function
283  bool find_hits();
284 
285  /// @brief After find_hits completes, use this function to have the
286  /// Matcher enerate the hit-combinations (matches) and send those matches
287  /// to the specified match-processor. The match processor may do what it
288  /// pleases with the matches.
289  void
290  process_matches( output::MatchProcessor & processor ) const;
291 
292 public:
293 
294  /// Data accessors
296  upstream_pose() const;
297 
299  downstream_pose() const;
300 
302  build_point( Size index ) const;
303 
305  upstream_builder( Size cst_id ) const;
306 
307  //Author: Kui Chan
308  //access function to pose_build_resids_
309  //Reason: Use to update the SecondaryMatcherToUpstreamResidue hit.second()
311  get_pose_build_resids() const;
312 
313  /// @brief Return const access to a representative downstream builder for a particular
314  /// geometric constraint. All downstream builders for a single geometric constraint
315  /// are required to behave the same when reconstructing the coordinates of the downstream
316  /// partner from a hit; therefore, a single representative is sufficient to recover
317  /// hit coordinates for any hit from a particular geometric constraint.
319  downstream_builder( Size cst_id ) const;
320 
321  std::list< downstream::DownstreamAlgorithmCOP >
322  downstream_algorithms( Size cst_id ) const;
323 
326 
327  HitList const &
328  hits( Size cst_id ) const;
329 
331  occ_space_hash() const;
332 
334  per_constraint_build_points( Size cst_id ) const;
335 
336 /// Non-const access
337 
339  build_point( Size index );
340 
342  upstream_builder( Size cst_id );
343 
344  bool
346 
347  /// @brief Return non-const access to a representative downstream builder
348  /// for a particular geometric constraint
351 
352  /// @brief Return non-const access to all of the downstream builders
353  /// for a particular geometric constraint
354  std::list< downstream::DownstreamBuilderOP > const &
355  downstream_builders( Size cst_id ) const;
356 
357 
358  /// @brief Non-const access to the set of downstream algorithms for a
359  /// particular geometric constraint -- note that the list containing
360  /// these algorithms is itself const.
361  std::list< downstream::DownstreamAlgorithmOP > const &
363 
365  occ_space_hash();
366 
367  /// @brief Return a non-constant iterator to a HitList for a particular geometric constraint.
368  /// DANGER DANGER DANGER.
369  /// This access is intended to allow a DownstreamAlgorithm to delete its own non-viable hits
370  /// and also to allow a DownstreamAlgorithm to delete another algorithm's non-viable hits;
371  /// Actual deletion requires invoking the method Matcher::erase_hit().
372  /// This non-const access is not intended for any other class.
374  hit_list_begin( Size geom_cst_id );
375 
376  /// @brief Return a non-constant iterator to the end position for a
377  /// HitList for a particular geometric constraint. See comments for hit_list_begin()
379  hit_list_end( Size geom_cst_id );
380 
381  /// @brief To be invoked by a downstream algorithm. Downstream algorithms may prune their
382  /// old, inviable hits, through this method -- they should pass themselves in as an argument
383  /// -- and they may also prune this hits for other rounds.
384  /// If the should prune other-round hits, then they will trigger an update to the
385  /// hit_lists_with_primary_modificiations_ list, leading to an additional pass over the
386  /// geometric constraints in a primary/peripheral pattern.
387  void erase_hit(
388  downstream::DownstreamAlgorithm const & dsalg,
389  Size geom_cst_id_for_hit,
390  HitListIterator const & iter
391  );
392 
393 private:
394  bool generate_hits();
396  void generate_hits_for_constraint( Size cst_id );
398 
400  void initialize_bump_grids();
404 
407  Size const cst_id,
409  utility::vector1< std::string > const & upstream_launch_atoms,
410  utility::vector1< core::id::AtomID > const & downstream_3atoms,
411  bool enumerate_ligand_rotamers,
412  bool catalytic_bond
413  );
414 
415  /// @brief Selects a subset of all possible hit combinations (e.g. by clustering hits)
416  /// for a particular bin. Useful if there are too many matches found.
417  void
419  utility::vector1< utility::vector1< Hit const * > > const & hit_vectors,
420  utility::vector1< Size > & n_hits_per_geomcst,
422  ) const;
423 
424  bool
426  match_dspos1 const & m1,
427  utility::LexicographicalIterator & lex,
428  output::MatchProcessor const & processor
429  ) const;
430 
431  bool
433  match const & m,
434  utility::LexicographicalIterator & lex
435  ) const;
436 
437 
438  bool
440  match const & m,
441  utility::vector1< HitPtrListCOP > const & upstream_only_hits,
442  utility::vector1< std::list< Hit const * >::const_iterator > & upstream_only_hit_iterators,
443  Size & last_upstream_only_geomcst_advanced,
444  output::MatchProcessor const & processor
445  ) const;
446 
447  bool
449  match_dspos1 const & m1,
450  utility::vector1< HitPtrListCOP > const & upstream_only_hits,
451  utility::vector1< std::list< Hit const * >::const_iterator > & upstream_only_hit_iterators,
452  Size & last_upstream_only_geomcst_advanced,
453  output::MatchProcessor const & processor
454  ) const;
455 
456  bool
458  utility::vector1< HitPtrListCOP > const & upstream_only_hits,
459  Size starting_point,
460  utility::vector1< std::list< Hit const * >::const_iterator > & upstream_only_hit_iterators,
461  Size & last_upstream_only_geomcst_advanced
462  ) const;
463 
465 
468  Vector & good_euclidean_bin_widths,
469  Vector & good_euler_bin_widths,
470  utility::vector1< std::list< Hit const * > > const & neighbor_hits
471  ) const;
472 
475  Vector const & euclidean_bin_widths,
476  Vector const & euler_bin_widths,
477  utility::vector1< std::list< Hit const * > > const & neighbor_hits
478  ) const;
479 
480 
481  Size
483  Vector const & euclidean_bin_widths,
484  Vector const & euler_bin_widths,
485  utility::vector1< std::list< Hit const * > > const & neighbor_hits,
486  Size accuracy_threshold
487  ) const;
488 
491  output::MatchProcessor & processor,
492  HitHasher & hit_hasher,
493  utility::vector1< std::list< Hit const * > > const & neighbor_hits
494  ) const;
495 
496 
497  void
499 
500  /// @brief Note that a change has occurred for a particular hit list
501  void
503 
504 private:
505  /// uncopyable -- unimplemented
506  Matcher( Matcher const & );
507  Matcher const & operator = ( Matcher const & rhs );
508 
509 private:
510 
512 
515 
519 
522 
523 
525 
527 
532 
533  /// Does the downstream algorithm want the 6D coordinate stored in hit.second() to
534  /// be hashed?
536 
539 
541  std::list< downstream::DownstreamBuilderOP > all_downstream_builders_;
542  std::list< downstream::DownstreamAlgorithmOP > all_downstream_algorithms_;
543 
546 
551 
553 
556  std::list< std::pair< Size, Real > > upstream_resids_and_radii_defining_active_site_;
557  std::list< core::id::AtomID > downstream_atoms_required_inside_active_site_;
559 
561 
563  bool output_matches_as_singular_downstream_positioning_; // use match_dspos1 output pathway?
566 };
567 
569 {
570 public:
573  num_sent_to_proc( 0 ),
577  all_lex_states( 0 ),
579  num_empty_uplist( 0 )
580  {}
581 
583  {
592 
593  return *this;
594  }
595 
604 
605 };
606 
607 }
608 }
609 
610 #endif