design doc ‐ job table

Design document about the introduction of a new table in the Oracle DB for the bookkeeping of the jobs.

goals

reduce the barrier to add new features
I want to assume that the latest and most complete set of information about a job is in the DB, not in a cached file on a scheduler.

design

all the job bookkeeping should be centralized in a single table, so that every portion of crab that needs to know that status of a job can query the oracle DB
after jobsplitting, TW creates an row for every job (retry 0).
dagman, prejob, postjob only touch the DB. no need for status_cache, runs_and_lumis.tar.gz, ...
crab status, crab report, ... should only read from the DB, not from status_cache, ...

option 1: one row for every CRAB job, with information about last retry only (if the value for the column is retrycount=4, then it means that the schedulers will have job_log.(0|1|2|3|4).txt). table primary key: (taskname, crab job id).

pro: less rows.
con: we lose a bit of history

option2: one row for every job on the scheduler. table primary key: (taskname, crab job id, retry count)

pro: better history tracking (site where the job run, ...)
con: need to check more than one row to look for the latest retry of a job.

dario prefers option 2.

implementation

Start adding information in the new table

add the new table in the DB
make the TW add one row per job after jobsplitting

identify all places where bookkeeping is done via editing the following files

status_cache.pkl
runs_and_lumis.tar.gz
...

for example, make sure that bookkeeping is done both with these files and via upgrading the job row in the DB in:

dagman
postjob
...

Then, adapt existing code to use the new source of information, instead of status_cache.pkl, ...

crab status should use information in the DB. Dario would like the summer student to get at least to this point.
crab report should use information from the DB
crab recovery should use information from the DB

when we are confident that no portion of crab is using the “old” files for bookkeeping:

remove the code that consumes the bookkeeping from files (crab status, ...)
remove the code that produces the bookkeeping into the files (dagman, postjob, ...): no more status_cache, for example

may never do:

do not remove runs_and_lumis.tar.gz: there may be users who rely on it! we may want to educate them, but we can not drop it altogether

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

design doc ‐ job table

goals

design

implementation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally