libRosetta was designed to meet an ambitious set of goals based on experience in using and debugging Rosetta and from discussions with the developers to elicit the tasks that were most difficult.
Goals
Develop a robust, modular Rosetta core that can grow outward to provide a 20+ year platform for research.
Follow a phased development plan to provide tangible benefits to the project as work progresses:
- Develop a representation of the chemical entities and molecular structure topology/conformation since that is the inner core upon which scoring, kinematics, optimization, and scientific protocols are built: growing a solid system outward from a robust core is much easier than growing inward
- Add the ability to score those conformations
- Integrate these components into Rosetta using existing systems for kinematics and optimization
- Extend libRosetta to support the full range of Rosetta entities and protocols
- Add kinematics and optimization layers to extend the benefits of the library up to the protocol level so that new protocols are easier to write and a range of Rosetta applications can be built from scripting languages
Principles
The design grew out of some basic principles:
- The components should be easy to use and hard to misuse
- Common tasks should be simple
- New types of biomolecular objects (amino acids, atoms, etc.) should be easy to add
- Rosetta's entities can have membership in multiple collections so the design should allow the researchers to focus on membership and not ownership
- Object ownership and lifetimes should be handled safely and automatically via shared ownership smart pointers: ownership relationships flow from collections to their members to avoid cycles
- Object associations can be held in non-ownership smart pointers that prevent deletion
- The biomolecular components should be unencumbered by exotic software idioms such as template meta-programming
- A layered design should be used to separate chemical, conformational, scoring, kinematics, and optimization systems to retain flexibility and keep layer responsibilities limited and well-defined to assure testability and maintainability
- Performance is vital and must be considered in every part of the design
- Performance cannot be degraded by a new design and must be assessed at each development stage to identify any performance problems early
- Performance-critical algorithms should be identified and redesigned when possible for improved speed
- Programming errors should be caught at compile time when possible but at least at run time with debug builds
- libRosetta must enhance testability by being stable under small changes and providing a deterministic computation sequence mode
Software Design Requirements
These principles led to some concrete software design requirements:
- Global data should be limited to constants and values set at start-up
- No magic number type codes: they are harmful to program maintainability and extensibility: type-safe named lookup keys will be used
- No fixed size limits and no wasted space: data structures should be dynamically right-sized
- Rosetta is memory constrained for some protocols on certain platforms: Reduce memory use via refactored data structures and algorithms
- Appropriate layered data structures should be used to eliminate higher dimensional arrays with a lot of wasted space and the need to use a Fortran slicing mechanism to access the object-level sub-arrays
- Abstract interfaces should be used to lower coupling between subsystems and improve build times
- Class hierarchies should be used to express the type relationships and to remove the use of explicit type testing and special case code blocks in the science functions
- Classes should provide well-defined abstractions and services with documented and assert-tested pre- and post-conditions
- Objects should perform their own bookkeeping where possible
- Classes should have consistent interfaces and style to make function names easy to guess and remember
- Class hierarchies will provide abstractions that can support new types, separate computational code from the specific i/o sources and sinks, and enable new algorithm variants to be easily plugged in.
- Operations should be well-defined without obscure side-effects
- Smart pointers should be used to control object lifetime and allow developers to make objects members of multiple collections without worrying about ownership
- Components should be "pluggable": special purpose types can be linked in by special purpose builds without disturbing Rosetta's core
- Numerical scalar types should be typedef names for documentation and to allow libRosetta to be easily built with float or double precision
- Random numbers used at different locations should be generated by decoupled generators, at least for repeatability in test runs
- Rosetta is highly modal: eliminate modes by appropriate generalizations where possible
Scientific Requirements
The scientific requirements for libRosetta subsystems were and are being elicited from the researchers and from an examination of the existing Rosetta source code.
Agile Process
Given the range of science requirements for Rosetta an agile development process was selected for libRosetta. This allowed a continuum of functional libraries of increasing capability to be released and tested starting very soon after development began. This methodology allows each release to be evaluated for usability and performance and any design adjustments made early, when such changes are easiest. Agile development has proven much more successful at building robust systems than attempts to create an up-front master design and implement it in one large development phase.
|