-
Notifications
You must be signed in to change notification settings - Fork 514
Remote input file management
This document describes a system in which input files are transferred to the BOINC server via Web RPCs.
In this system, you must supply physical names of files that are globally unique. The easiest way to do this is to include a hash of the file contents in the name.
File cleanup is based on file/batch associations.
You must create a batch (typically with the create_batch() RPC)
before querying or uploading files.
Each file can be associated with one or more batches.
Files that are no longer associated with an active batch are
automatically deleted from the server.
The system uses two Web RPCs.
These are implemented as XML sent via HTTP POST;
the RPC handler is html/user/job_files.php.
The Python binding is part of the BOINC_SERVER class described in
Remote job submission RPCs.
This class provides two functions:
query_files(phys_names, batch_id=0, delete_time=0)This takes a list of physical filenames. It returns a list of the indices of the files that are not currently on the server:
{'absent_files': {'file': '1'}}For the other files, it creates batch/file associations as needed.
upload_files(local_names, phys_names, batch_id=0, delete_time=0)This first calls query_files to get a list of missing files.
Then it uploads these files and creates batch/file associations.
## C++ interface
The following C++ functions are provided (in lib/remote_submit.cpp).
They are to be called on the job submission host;
the files must exist on that host.
```c
extern int query_files(
const char* project_url,
const char* authenticator,
std::vector<string> &boinc_names, // must be unique, e.g. by including content hash
int batch_id,
std::vector<int> &absent_files, // output
std::string& error_message
);
Inputs:
-
project_url: the project's master URL -
authenticator: the job submitter's authenticator. -
boinc_names: a duplicate-free list of the BOINC's physical names of the files. These typically will include a hash (e.g. MD5) of the file contents. -
batch_id: the ID of a batch whose jobs will reference the files (these jobs need not exist yet). The operation will fail if the user is not authorized to submit jobs to the batch's application.
Action: for each file, see if it exists on the server. If it does, create an association to the given batch.
Output:
- return value: nonzero on error
-
absent_files: a list of files not present on the server (represented as indices into the boinc_names vector). -
error_message: if error, an explanatory string.
extern int upload_files (
const char* project_url,
const char* authenticator,
std::vector<string> &paths,
std::vector<string> &boinc_names,
int batch_id,
std::string& error_message
);Inputs:
-
project_url, authenticator, batch_id: as above. -
paths: a list of paths of files to be uploaded -
boinc_names: a list of BOINC names of these files (see above). -
batch_id: the ID of a batch with which the files are associated. The operation will fail if the user is not authorized to submit jobs to the batch's application.
Action: Upload the files, and create associations to the given batch.
Output:
- return value: nonzero on error
-
error_message: if error, an explanatory string.
If you use this system, periodically run the script
html/ops/delete_job_files.
This will delete files that are no longer associated with an active batch.
Note: This mechanism upload files via a PHP script. PHP's default max file upload size is 2MB. To increase this, edit /etc/php.ini, and change, e.g.
upload_max_filesize = 64M
post_max_size = 64M