-
Notifications
You must be signed in to change notification settings - Fork 39
design doc ‐ job table
Design document about the introduction of a new table in the Oracle DB for the bookkeeping of the jobs.
- reduce the barrier to add new features
- I want to assume that the latest and most complete set of information about a job is in the DB, not in a cached file on a scheduler.
- all the job bookkeeping should be centralized in a single table, so that every portion of crab that needs to know that status of a job can query the oracle DB
- after jobsplitting, TW creates an row for every job (retry 0).
- dagman, prejob, postjob only touch the DB. no need for status_cache, runs_and_lumis.tar.gz, ...
- crab status, crab report, ... should only read from the DB, not from status_cache, ...
option 1: one row for every CRAB job, with information about last retry only (if the value for the column is retrycount=4, then it means that the schedulers will have job_log.(0|1|2|3|4).txt). table primary key: (taskname, crab job id).
- pro: less rows.
- con: we lose a bit of history
option2: one row for every job on the scheduler. table primary key: (taskname, crab job id, retry count)
- pro: better history tracking (site where the job run, ...)
- con: need to check more than one row to look for the latest retry of a job.
dario prefers option 2.
Start adding information in the new table
- add the new table in the DB
- make the TW add one row per job after jobsplitting
identify all places where bookkeeping is done via editing the following files
status_cache.pkl
runs_and_lumis.tar.gz
- ...
for example, make sure that bookkeeping is done both with these files and via upgrading the job row in the DB in:
- dagman
- postjob
- ...
Then, adapt existing code to use the new source of information, instead of status_cache.pkl
, ...
- crab status should use information in the DB. Dario would like the summer student to get at least to this point.
- crab report should use information from the DB
- crab recovery should use information from the DB
when we are confident that no portion of crab is using the “old” files for bookkeeping:
- remove the code that consumes the bookkeeping from files (crab status, ...)
- remove the code that produces the bookkeeping into the files (dagman, postjob, ...): no more status_cache, for example
may never do:
- do not remove
runs_and_lumis.tar.gz
: there may be users who rely on it! we may want to educate them, but we can not drop it altogether