-
Notifications
You must be signed in to change notification settings - Fork 514
Job processing (introduction)
BOINC has evolved over 20+ years to handle a wide range of high-throughput computing use cases.
As a result, BOINC provides lots of options. There are 7 degrees of freedom, or 'choice points', in a BOINC job-processing pipeline. BOINC offers multiple choices for each point. You (the BOINC project admin) must pick one. These choices are mostly independent, though in some cases your choice at one point limits the available choices at others.
Some of the choices require you to write programs (usually in C++, sometimes in Python) that run on the server.
In general each BOINC app has its own pipeline, and you can make different choices for each app.
The choices for a given app are embodied
in the set of daemons and tasks in your config.xml file.
In some cases other configuration is needed as well.
You can set things up so that there are multiple job submitters, sharing (and competing for) computing power. Each job submitter is identified by a user account on the project.
This is typically used with remote or web-based job submission (see below). It's used in BOINC Central.
BOINC has an access control system
that lets project admins limit what apps a user can submit jobs to.
Admins can also assign quotas that determine
how much computing each user gets.
(This requires using a special feeder, feeder_user).
BOINC allows jobs to be collected into 'batches', which can be monitored and controlled as a unit. Job 'ownership' is at the level of batches (not individual jobs) so batches are needed for the multi-submitter scenario. BOINC's web interfaces for controlling batches and downloading outputs do access control based on batch ownership.
In the 'stream model', jobs don't belong to batches. In this case you're typically trying to keep the pipeline full but not overfull.
Choices:
-
A user-supplied work generator program running on the server (stream model). This typically means that job submitters log into the server.
-
Web RPCs: uses batches and ownership. These RPCs have PHP and Python bindings. This makes it possible create systems that allow scientists to process jobs without logging into the BOINC server.
-
Web interfaces
- BUDA (BOINC-supplied).
- App-specific (user-supplied).
Both use ownership and batches.
-
CLI program (tools/submit_job, tools/submit_batch)
This provides protection against hardware errors and cheating (volunteers intentionally returning wrong results). But it adds complexity and reduces throughput.
Note: this should not be confused with failure retry. If an instance of a job fails, BOINC will automatically create and run another instance, up to a specified limit.
'Validators' examine the outputs of one or more job instances, and select one (the 'canonical instance') as being correct.
- BOINC-supplied validators
- trivial: all jobs are valid
- bitwise: outputs must agree exactly
- App-specific: have thresholds for fuzzy comparison.
In general app-specific is needed for jobs with floating-point arithmetic; however, homogeneous redundancy may let you use bitwise validation.
When a client finishes a job, it uploads the output files into an 'upload hierarchy' (4096 directories). There are three choices for how these files are accessed and eventually deleted:
- Stream-oriented: output files remain in the upload hierarchy. and are deleted after the job is assimilated.
- Batch-collect: output files of the canonical result are moved ('collected') into a per-batch directory.
- Batch-static: output files remain in the upload hierarchy and are deleted after the batch is retired.
This is described in more detail here.
Similarly, input files are stored in a 'download hierarchy', from which they are downloaded by clients. Choices for how the files get there and are deleted:
- Stream-oriented: files are 'staged' (placed the download hierarchy) by the work generator.
- Job-file system: the DB stores a description of each input file, and the associations between batches and files. Files are deleted when there are no associations and an expiration time has passed. Implemented by a web RPC system.
- User file sandbox: using a web interface, job submitters upload files to a 'sandbox' on the server. Job submission systems copy these to the download hierarchy. This is used by BUDA. The other paradigms require file immutability (you can't reused filenames); the sandbox model paradigm does not.
This is described in more detail here.