|
Rosetta 3.5
|
#include <MPIFileBufJobDistributor.hh>


Public Member Functions | |
| virtual | ~MPIFileBufJobDistributor () |
| dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt More... | |
| core::Size | increment_client_rank () |
| core::Size | min_client_rank () const |
| return rank of first worker process (there might be more dedicated processes, e.g., ArchiveManager...) More... | |
| virtual void | go (protocols::moves::MoverOP mover) |
| dummy for master/slave version More... | |
| virtual core::Size | get_new_job_id () |
| dummy for master/slave version More... | |
| virtual void | mark_current_job_id_for_repetition () |
| dummy for master/slave version More... | |
| virtual void | remove_bad_inputs_from_job_list () |
| dummy for master/slave version More... | |
| virtual void | job_succeeded (core::pose::Pose &pose, core::Real runtime) |
| dummy for master/slave version More... | |
| virtual void | job_failed (core::pose::Pose &pose, bool will_retry) |
| This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter. More... | |
Public Member Functions inherited from protocols::jd2::JobDistributor | |
| virtual | ~JobDistributor () |
| void | go (protocols::moves::MoverOP mover, JobOutputterOP jo) |
| invokes go, after setting JobOutputter More... | |
| JobOP | current_job () const |
| Movers may ask their controlling job distributor for information about the current job. They may also load information into this job for later output. More... | |
| std::string | current_output_name () const |
| Movers may ask their controlling job distributor for the output name as defined by the Job and JobOutputter. More... | |
| JobOutputterOP | job_outputter () const |
| Movers (or derived classes) may ask for the JobOutputter. More... | |
| void | set_job_outputter (const JobOutputterOP &new_job_outputter) |
| Movers (or derived classes) may ask for the JobOutputter. More... | |
| JobInputterOP | job_inputter () const |
| JobInputter access. More... | |
| virtual void | mpi_finalize (bool finalize) |
| should the go() function call MPI_finalize()? It probably should, this is true by default. More... | |
| JobInputterInputSource::Enum | job_inputter_input_source () const |
| The input source for the current JobInputter. More... | |
| virtual void | restart () |
| core::Size | total_nr_jobs () const |
| core::Size | current_job_id () const |
| integer access - which job are we on? More... | |
| std::string | get_current_batch () const |
| what is the current batch ? — name refers to the flag-file used for this batch More... | |
| virtual void | add_batch (std::string const &, core::Size id=0) |
| add a new batch ( name will be interpreted as flag_file ) More... | |
| core::Size | current_batch_id () const |
| what is the current batch number ? — refers to position in batches_ More... | |
Protected Member Functions | |
| MPIFileBufJobDistributor () | |
| ctor is protected; singleton pattern More... | |
| MPIFileBufJobDistributor (core::Size master_rank, core::Size file_buf_rank, core::Size min_client_rank, bool start_empty=false) | |
| protected ctor for child-classes More... | |
| virtual void | handle_interrupt () |
| This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn. More... | |
| virtual bool | process_message (core::Size msg_tag, core::Size slave_rank, core::Size slave_job_id, core::Size slave_batch_id, core::Real runtime) |
| virtual bool | next_batch () |
| switch current_batch_id_ to next batch More... | |
| void | master_go (protocols::moves::MoverOP mover) |
| Handles the receiving of job requests and the sending of job ids to and from slaves. More... | |
| core::Size | master_get_new_job_id () |
| Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags. More... | |
| core::Size | slave_get_new_job_id () |
| requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true More... | |
| void | master_mark_current_job_id_for_repetition () |
| This should never be called as this is handled internally by the slave nodes, it utility_exits. More... | |
| void | slave_mark_current_job_id_for_repetition () |
| Sets the repeat_job_ flag to true. More... | |
| void | master_remove_bad_inputs_from_job_list () |
| Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input. More... | |
| void | slave_remove_bad_inputs_from_job_list () |
| Sends a message to the head node that contains the id of a job that had bad input. More... | |
| void | master_job_succeeded (core::pose::Pose &pose) |
| This should never be called as this is handled internally by the slave nodes, it utility_exits. More... | |
| void | slave_job_succeeded (core::pose::Pose &pose) |
| Sends a message to the head node upon successful job completion to avoid output interleaving. More... | |
| void | slave_to_master (core::Size tag) |
| send a message to master More... | |
| void | send_job_to_slave (core::Size slave_rank) |
| called by master to send and by slave to receive job More... | |
| core::Size | rank () const |
| return rank of this process More... | |
| core::Size | master_rank () const |
| return rank of master process ( where JobDistributor is running ) More... | |
| core::Size | file_buf_rank () const |
| return rank of file-buffer process ( where output data (via ozstream )is handled ) More... | |
| core::Size | number_of_processors () |
| how many processes — this includes dedicated processes More... | |
| core::Size | n_rank () |
| how many processes — this includes dedicated processes More... | |
| core::Size | n_worker () |
| how many workers — important to keep track during spin-down process More... | |
| void | set_n_worker (core::Size setting) |
| how many workers — important to keep track during spin-down process More... | |
| virtual void | mark_job_as_completed (core::Size job_id, core::Size batch_id, core::Real runtime) |
| marks job as completed in joblist More... | |
| virtual void | mark_job_as_bad (core::Size job_id, core::Size batch_id) |
| marks job as bad in joblist More... | |
| void | eat_signal (core::Size signal, int source) |
| receive a certain signal and ignore it.... this is needed, for instance, when MPIArchiveJobDistributor triggers an ADD_BATCH signal by sending QUEUE_EMPTY to the ArchiveManager... More... | |
Protected Member Functions inherited from protocols::jd2::JobDistributor | |
| JobDistributor () | |
| Singleton instantiation pattern; Derived classes will call default ctor, but their ctors, too must be protected (and the JDFactory must be their friend.) More... | |
| JobDistributor (bool empty) | |
| MPIArchiveJobDistributor starts with an empty job-list... More... | |
| void | go_main (protocols::moves::MoverOP mover) |
| Non-virtual get-job, run it, & output loop. This function is pretty generic and your subclass may be able to use it. It is NOT virtual - this implementation can be shared by (at least) the simple FileSystemJobDistributor, the MPIWorkPoolJobDistributor, and the MPIWorkPartitionJobDistributor. Do not feel that you need to use it as-is in your class - but DO plan on implementing all its functionality! More... | |
| Jobs const & | get_jobs () const |
| Read access to private data for derived classes. More... | |
| void | mark_job_as_completed (core::Size job_id, core::Real run_time) |
| Jobs is the container of Job objects need non-const to mark Jobs as completed on Master in MPI-JobDistributor. More... | |
| void | mark_job_as_bad (core::Size job_id) |
| ParserOP | parser () const |
| Parser access. More... | |
| void | begin_critical_section () |
| void | end_critical_section () |
| bool | obtain_new_job (bool re_consider_current_job=false) |
| this function updates the current_job_id_ and current_job_ fields. The boolean return states whether or not a new job was obtained (if false, quit distributing!) More... | |
| virtual void | current_job_finished () |
| Derived classes are allowed to clean up any temporary files or data relating to the current job after the current job has completed. Called inside go_main loop. Default implementation is a no-op. More... | |
| virtual void | note_all_jobs_finished () |
| Derived classes are allowed to perform some kind of action when the job distributor runs out of jobs to execute. Called inside go_main. Default implementation is a no-op. More... | |
| void | clear_current_job_output () |
| void | set_batch_id (core::Size setting) |
| set current_batch_id — eg for slave nodes in MPI framework More... | |
| virtual void | batch_underflow () |
| if end of batches_ reached via next_batch or set_batch_id ... More... | |
| virtual void | load_new_batch () |
| called by next_batch() or set_batch_id() to switch-over and restart JobDistributor on new batch More... | |
| core::Size | nr_batches () const |
| how many batches are in our list ... this can change dynamically More... | |
| std::string const & | batch (core::Size batch_id) |
| give name of batch with given id More... | |
Private Types | |
| typedef JobDistributor | Parent |
Private Attributes | |
| core::Size | n_rank_ |
| total number of processing elements More... | |
| core::Size | n_worker_ |
| core::Size | rank_ |
| rank of the "local" instance More... | |
| core::Size | slave_current_job_id_ |
| where slave jobs store current job id More... | |
| core::Size | slave_current_batch_id_ |
| batch_id allow to run multiple batches of jobs - More... | |
| core::Real | slave_current_runtime_ |
| runtime of last job More... | |
| core::Size | bad_job_id_ |
| where master stores next job to assign (in a good state after get_new_job_id up until it's used) More... | |
| bool | repeat_job_ |
| where slave stores whether it should repeat its current job id More... | |
| core::Size | jobs_assigned_ |
| keep some statistics about the jobs this is mostly just for silly tr.Info messages... More... | |
| core::Size | jobs_returned_ |
| jobs that have returned (either, bad or good ) More... | |
| core::Size | bad_jobs_ |
| jobs that have returned bad More... | |
| core::Size | n_nodes_left_to_spin_down_ |
| how many more to spin down More... | |
| core::Size const | master_rank_ |
| keep here the ranks of different functional processes More... | |
| core::Size const | file_buf_rank_ |
| the File-Buffer More... | |
| core::Size | min_client_rank_ |
| the first slave node More... | |
| core::Real | cumulated_runtime_ |
| keep track of average timings for time-outs More... | |
| core::Size | cumulated_jobs_ |
Friends | |
| class | JobDistributorFactory |
Additional Inherited Members | |
Static Public Member Functions inherited from protocols::jd2::JobDistributor | |
| static JobDistributor * | get_instance () |
Static Protected Member Functions inherited from protocols::jd2::JobDistributor | |
| static void | setup_system_signal_handler (void(*prev_fn)(int)=jd2_signal_handler) |
| Setting up callback function that will be call when our process is about to terminate. More... | |
| static void | remove_system_signal_handler () |
| Set signal handler back to default state. More... | |
| static void | jd2_signal_handler (int Signal) |
| Default callback function for signal handling. More... | |
This JobDistributor is intended for machines where you have a large number of processors. two dedicated processes are used to handle JobDistribution and File-IO. all other processes (higher rank ) are used for computation. the file_buf_rank_ process runs the MpiFileBuffer which is at the receiving end of all ozstream output that is rerouted via MPI from the slave nodes. This means all slaves write to the same file without FileSystem congestion and interlacing in the file – IO is handled from a single dedicated process The other dedicated process (master_rank) runs the actual JobDistributor and is only used to distribute jobs to slaves and receive their notification of successful or failed execution in case you have only a small number of processors you can put say 10 MPI processes on 8 processors to get optimal CPU usage.
Definition at line 60 of file MPIFileBufJobDistributor.hh.
|
private |
Definition at line 62 of file MPIFileBufJobDistributor.hh.
|
protected |
ctor is protected; singleton pattern
constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.
Definition at line 66 of file MPIFileBufJobDistributor.cc.
References min_client_rank_, n_rank_, n_worker_, and rank_.
|
protected |
protected ctor for child-classes
constructor. Notice it calls the parent class! It also builds some internal variables for determining which processor it is in MPI land.
Definition at line 94 of file MPIFileBufJobDistributor.cc.
References min_client_rank_, n_rank_, n_worker_, and rank_.
|
virtual |
dtor WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt
WARNING WARNING! SINGLETONS' DESTRUCTORS ARE NEVER CALLED IN MINI! DO NOT TRY TO PUT THINGS IN THIS FUNCTION! here's a nice link explaining why: http://www.research.ibm.com/designpatterns/pubs/ph-jun96.txt
Definition at line 130 of file MPIFileBufJobDistributor.cc.
|
protected |
receive a certain signal and ignore it.... this is needed, for instance, when MPIArchiveJobDistributor triggers an ADD_BATCH signal by sending QUEUE_EMPTY to the ArchiveManager...
receive message of certain type – and ignore it ... sometimes needed in communication protocol
Definition at line 243 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::MPI_JOB_DIST_TAG(), process_message(), and protocols::jd2::tr().
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::batch_underflow().
|
inlineprotected |
return rank of file-buffer process ( where output data (via ozstream )is handled )
Definition at line 181 of file MPIFileBufJobDistributor.hh.
References file_buf_rank_.
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::go().
|
virtual |
dummy for master/slave version
Implements protocols::jd2::JobDistributor.
Reimplemented in protocols::jd2::MPIMultiCommJobDistributor.
Definition at line 345 of file MPIFileBufJobDistributor.cc.
References master_get_new_job_id(), master_rank_, rank_, and slave_get_new_job_id().
|
virtual |
dummy for master/slave version
Reimplemented from protocols::jd2::JobDistributor.
Reimplemented in protocols::jd2::archive::MPIArchiveJobDistributor.
Definition at line 134 of file MPIFileBufJobDistributor.cc.
References file_buf_rank_, protocols::jd2::JobDistributor::go_main(), master_go(), master_rank_, min_client_rank_, rank_, protocols::jd2::MpiFileBuffer::run(), protocols::jd2::MpiFileBuffer::stop(), and protocols::jd2::tr().
|
inlineprotectedvirtual |
This function got called when job is not yet finished and got termitated abnormaly (ctrl-c, kill etc). when implimenting it in subclasses make sure to delete all in-progress-data that your job spawn.
Implements protocols::jd2::JobDistributor.
Reimplemented in protocols::jd2::archive::MPIArchiveJobDistributor, and protocols::jd2::MPIMultiCommJobDistributor.
Definition at line 70 of file MPIFileBufJobDistributor.hh.
|
inline |
Definition at line 77 of file MPIFileBufJobDistributor.hh.
References min_client_rank_.
|
virtual |
This function is called when we give up on the job; it has been virtualized so BOINC and MPI can delay/protect output base implementation is just a call to the job outputter.
no-op implementation in the base class
Reimplemented from protocols::jd2::JobDistributor.
Reimplemented in protocols::jd2::MPIMultiCommJobDistributor.
Definition at line 507 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JOB_FAILED_NO_RETRY, min_client_rank_, rank_, and slave_to_master().
|
virtual |
dummy for master/slave version
Reimplemented from protocols::jd2::JobDistributor.
Reimplemented in protocols::jd2::MPIMultiCommJobDistributor.
Definition at line 496 of file MPIFileBufJobDistributor.cc.
References master_job_succeeded(), master_rank_, rank_, slave_current_runtime_, and slave_job_succeeded().
|
virtual |
dummy for master/slave version
Implements protocols::jd2::JobDistributor.
Definition at line 420 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JobDistributor::clear_current_job_output(), master_mark_current_job_id_for_repetition(), master_rank_, rank_, and slave_mark_current_job_id_for_repetition().
|
protectedvirtual |
marks job as bad in joblist
mark job as failed — remove future versions of same input from list
Reimplemented in protocols::jd2::archive::MPIArchiveJobDistributor.
Definition at line 234 of file MPIFileBufJobDistributor.cc.
References bad_job_id_, protocols::jd2::JobDistributor::current_batch_id(), and remove_bad_inputs_from_job_list().
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_bad(), and process_message().
|
protectedvirtual |
marks job as completed in joblist
mark job as completed
Reimplemented in protocols::jd2::archive::MPIArchiveJobDistributor.
Definition at line 223 of file MPIFileBufJobDistributor.cc.
References cumulated_jobs_, cumulated_runtime_, protocols::jd2::JobDistributor::current_batch_id(), and protocols::jd2::JobDistributor::mark_job_as_completed().
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_completed(), master_get_new_job_id(), and process_message().
|
protected |
Always returns zero, simply increments next_job_to_assign_ to the next job that should be run based on what has been completeted and the overwrite flags.
work out what next job is
Definition at line 358 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::JobDistributor::current_job_id(), protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::job_outputter(), mark_job_as_completed(), and protocols::jd2::tr().
Referenced by get_new_job_id().
|
protected |
Handles the receiving of job requests and the sending of job ids to and from slaves.
the main message loop — master cycles thru until all slave nodes have been spun down
Definition at line 274 of file MPIFileBufJobDistributor.cc.
References bad_jobs_, cumulated_jobs_, cumulated_runtime_, protocols::jd2::JobDistributor::current_job_id(), jobs_assigned_, jobs_returned_, master_rank_, MPI_ANY_SOURCE, protocols::jd2::MPI_JOB_DIST_TAG(), n_nodes_left_to_spin_down_, n_worker(), protocols::jd2::JobDistributor::obtain_new_job(), process_message(), rank_, and protocols::jd2::tr().
Referenced by go(), and protocols::jd2::archive::MPIArchiveJobDistributor::go().
|
protected |
This should never be called as this is handled internally by the slave nodes, it utility_exits.
Definition at line 522 of file MPIFileBufJobDistributor.cc.
References master_rank_, rank_, and protocols::jd2::tr().
Referenced by job_succeeded().
|
protected |
This should never be called as this is handled internally by the slave nodes, it utility_exits.
Definition at line 431 of file MPIFileBufJobDistributor.cc.
References master_rank_, rank_, and protocols::jd2::tr().
Referenced by mark_current_job_id_for_repetition().
|
inlineprotected |
return rank of master process ( where JobDistributor is running )
Definition at line 176 of file MPIFileBufJobDistributor.hh.
References master_rank_.
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::batch_underflow(), protocols::jd2::archive::MPIArchiveJobDistributor::go(), protocols::jd2::archive::MPIArchiveJobDistributor::load_new_batch(), protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_bad(), protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_completed(), protocols::jd2::archive::MPIArchiveJobDistributor::master_to_archive(), protocols::jd2::archive::MPIArchiveJobDistributor::process_message(), and protocols::jd2::archive::MPIArchiveJobDistributor::sync_batches().
|
protected |
Simply increments next_job_to_assign_ to the next job that should be run based on what has been completed and if the input job tag of the job marked as having bad input.
Definition at line 458 of file MPIFileBufJobDistributor.cc.
References bad_job_id_, protocols::jd2::JobDistributor::get_jobs(), protocols::jd2::JobDistributor::job_outputter(), protocols::jd2::JobDistributor::mark_job_as_bad(), master_rank_, protocols::jd2::JobDistributor::obtain_new_job(), rank_, and protocols::jd2::tr().
Referenced by remove_bad_inputs_from_job_list().
|
inline |
return rank of first worker process (there might be more dedicated processes, e.g., ArchiveManager...)
Definition at line 82 of file MPIFileBufJobDistributor.hh.
References min_client_rank_.
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::go().
|
inlineprotected |
how many processes — this includes dedicated processes
Definition at line 191 of file MPIFileBufJobDistributor.hh.
References n_rank_.
|
inlineprotected |
how many workers — important to keep track during spin-down process
Definition at line 196 of file MPIFileBufJobDistributor.hh.
References n_worker_.
Referenced by master_go().
|
protectedvirtual |
switch current_batch_id_ to next batch
Reimplemented from protocols::jd2::JobDistributor.
Definition at line 389 of file MPIFileBufJobDistributor.cc.
References cumulated_jobs_, cumulated_runtime_, master_rank_, protocols::jd2::JobDistributor::next_batch(), and rank_.
|
inlineprotected |
how many processes — this includes dedicated processes
Definition at line 186 of file MPIFileBufJobDistributor.hh.
References n_rank_.
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::notify_archive().
|
protectedvirtual |
messages are received constantly by Master JobDistributor and then the virtual process_message() method is used to assign some action to each message ... this allows child-classes to answer to more messages or change behaviour of already known messages
Reimplemented in protocols::jd2::archive::MPIArchiveJobDistributor.
Definition at line 189 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::BAD_INPUT, bad_jobs_, protocols::jd2::JobDistributor::current_job(), protocols::jd2::JobDistributor::current_job_id(), protocols::jd2::JOB_FAILED_NO_RETRY, protocols::jd2::JOB_SUCCESS, jobs_assigned_, jobs_returned_, mark_job_as_bad(), mark_job_as_completed(), n_nodes_left_to_spin_down_, protocols::jd2::NEW_JOB_ID, protocols::jd2::JobDistributor::obtain_new_job(), send_job_to_slave(), and protocols::jd2::tr().
Referenced by eat_signal(), master_go(), and protocols::jd2::archive::MPIArchiveJobDistributor::process_message().
|
inlineprotected |
return rank of this process
Definition at line 171 of file MPIFileBufJobDistributor.hh.
References rank_.
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::batch_underflow(), protocols::jd2::archive::MPIArchiveJobDistributor::go(), protocols::jd2::archive::MPIArchiveJobDistributor::is_archive_rank(), protocols::jd2::archive::MPIArchiveJobDistributor::load_new_batch(), protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_bad(), protocols::jd2::archive::MPIArchiveJobDistributor::mark_job_as_completed(), protocols::jd2::archive::MPIArchiveJobDistributor::master_to_archive(), protocols::jd2::archive::MPIArchiveJobDistributor::process_message(), protocols::jd2::archive::MPIArchiveJobDistributor::set_archive(), and protocols::jd2::archive::MPIArchiveJobDistributor::sync_batches().
|
virtual |
dummy for master/slave version
Reimplemented from protocols::jd2::JobDistributor.
Definition at line 448 of file MPIFileBufJobDistributor.cc.
References master_rank_, master_remove_bad_inputs_from_job_list(), rank_, and slave_remove_bad_inputs_from_job_list().
Referenced by mark_job_as_bad().
|
protected |
called by master to send and by slave to receive job
This is the heart of the MPIFileBufJobDistributor. It consistits of two while loops: the job distribution loop (JDL) and the node spin down loop (NSDL). The JDL has three functions. The first is to recieve and process messages from the slave nodes requesting new job ids. The second is to recieve and process messages from the slave nodes indicating a bad input. The third is to recive and process job_success messages from the slave nodes and block while the slave node is writing its output. This is prevent Sizeerleaving of output in score files and silent files. The function of the NSDL is to keep the head node alive while there are still slave nodes processing. Without the NSDL if a slave node finished its allocated job after the head node had finished handing out all of the jobs and exiting (a very likely scenario), it would wait indefinitely for a response from the head node when requesting a new job id.
Definition at line 167 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JobDistributor::current_batch_id(), protocols::jd2::JobDistributor::current_job_id(), master_rank_, protocols::jd2::MPI_JOB_DIST_TAG(), rank_, slave_current_batch_id_, slave_current_job_id_, and protocols::jd2::tr().
Referenced by process_message(), and slave_get_new_job_id().
|
inlineprotected |
how many workers — important to keep track during spin-down process
Definition at line 201 of file MPIFileBufJobDistributor.hh.
References n_worker_.
|
protected |
requests, receives, and returns a new job id from the master node or returns the current job id if the repeat_job_ flag is set to true
Definition at line 397 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JobDistributor::get_current_batch(), master_rank_, protocols::jd2::NEW_JOB_ID, rank_, repeat_job_, send_job_to_slave(), protocols::jd2::JobDistributor::set_batch_id(), slave_current_batch_id_, slave_current_job_id_, slave_to_master(), and protocols::jd2::tr().
Referenced by get_new_job_id().
|
protected |
Sends a message to the head node upon successful job completion to avoid output interleaving.
Definition at line 532 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JobDistributor::current_job(), protocols::jd2::JobDistributor::job_outputter(), protocols::jd2::JOB_SUCCESS, master_rank_, rank_, and slave_to_master().
Referenced by job_succeeded().
|
protected |
Sets the repeat_job_ flag to true.
Definition at line 439 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::JobDistributor::current_job_id(), master_rank_, rank_, repeat_job_, and protocols::jd2::tr().
Referenced by mark_current_job_id_for_repetition().
|
protected |
Sends a message to the head node that contains the id of a job that had bad input.
Definition at line 489 of file MPIFileBufJobDistributor.cc.
References protocols::jd2::BAD_INPUT, and slave_to_master().
Referenced by remove_bad_inputs_from_job_list().
|
protected |
send a message to master
Definition at line 475 of file MPIFileBufJobDistributor.cc.
References master_rank_, protocols::jd2::MPI_JOB_DIST_TAG(), rank_, slave_current_batch_id_, slave_current_job_id_, and slave_current_runtime_.
Referenced by protocols::jd2::archive::MPIArchiveJobDistributor::batch_underflow(), job_failed(), slave_get_new_job_id(), slave_job_succeeded(), and slave_remove_bad_inputs_from_job_list().
|
friend |
Definition at line 118 of file MPIFileBufJobDistributor.hh.
|
private |
where master stores next job to assign (in a good state after get_new_job_id up until it's used)
where master temporarily stores id of jobs with bad input
Definition at line 238 of file MPIFileBufJobDistributor.hh.
Referenced by mark_job_as_bad(), and master_remove_bad_inputs_from_job_list().
|
private |
jobs that have returned bad
Definition at line 254 of file MPIFileBufJobDistributor.hh.
Referenced by master_go(), and process_message().
|
private |
Definition at line 274 of file MPIFileBufJobDistributor.hh.
Referenced by mark_job_as_completed(), master_go(), and next_batch().
|
private |
keep track of average timings for time-outs
Definition at line 272 of file MPIFileBufJobDistributor.hh.
Referenced by mark_job_as_completed(), master_go(), and next_batch().
|
private |
the File-Buffer
Definition at line 265 of file MPIFileBufJobDistributor.hh.
Referenced by file_buf_rank(), and go().
|
private |
keep some statistics about the jobs this is mostly just for silly tr.Info messages...
jobs send to slave-nodes
Definition at line 248 of file MPIFileBufJobDistributor.hh.
Referenced by master_go(), and process_message().
|
private |
jobs that have returned (either, bad or good )
Definition at line 251 of file MPIFileBufJobDistributor.hh.
Referenced by master_go(), and process_message().
|
private |
keep here the ranks of different functional processes
the job-distributor (master)
Definition at line 262 of file MPIFileBufJobDistributor.hh.
Referenced by get_new_job_id(), go(), job_succeeded(), mark_current_job_id_for_repetition(), master_go(), master_job_succeeded(), master_mark_current_job_id_for_repetition(), master_rank(), master_remove_bad_inputs_from_job_list(), next_batch(), remove_bad_inputs_from_job_list(), send_job_to_slave(), slave_get_new_job_id(), slave_job_succeeded(), slave_mark_current_job_id_for_repetition(), and slave_to_master().
|
private |
the first slave node
Definition at line 269 of file MPIFileBufJobDistributor.hh.
Referenced by go(), increment_client_rank(), job_failed(), min_client_rank(), and MPIFileBufJobDistributor().
|
private |
how many more to spin down
Definition at line 257 of file MPIFileBufJobDistributor.hh.
Referenced by master_go(), and process_message().
|
private |
total number of processing elements
Definition at line 218 of file MPIFileBufJobDistributor.hh.
Referenced by MPIFileBufJobDistributor(), n_rank(), and number_of_processors().
|
private |
Definition at line 220 of file MPIFileBufJobDistributor.hh.
Referenced by MPIFileBufJobDistributor(), n_worker(), and set_n_worker().
|
private |
rank of the "local" instance
Definition at line 223 of file MPIFileBufJobDistributor.hh.
Referenced by get_new_job_id(), go(), job_failed(), job_succeeded(), mark_current_job_id_for_repetition(), master_go(), master_job_succeeded(), master_mark_current_job_id_for_repetition(), master_remove_bad_inputs_from_job_list(), MPIFileBufJobDistributor(), next_batch(), rank(), remove_bad_inputs_from_job_list(), send_job_to_slave(), slave_get_new_job_id(), slave_job_succeeded(), slave_mark_current_job_id_for_repetition(), and slave_to_master().
|
private |
where slave stores whether it should repeat its current job id
Definition at line 241 of file MPIFileBufJobDistributor.hh.
Referenced by slave_get_new_job_id(), and slave_mark_current_job_id_for_repetition().
|
private |
batch_id allow to run multiple batches of jobs -
Definition at line 229 of file MPIFileBufJobDistributor.hh.
Referenced by send_job_to_slave(), slave_get_new_job_id(), and slave_to_master().
|
private |
where slave jobs store current job id
Definition at line 226 of file MPIFileBufJobDistributor.hh.
Referenced by send_job_to_slave(), slave_get_new_job_id(), and slave_to_master().
|
private |
runtime of last job
Definition at line 232 of file MPIFileBufJobDistributor.hh.
Referenced by job_succeeded(), and slave_to_master().
1.8.4