Skip to content

Apply check util functions in cron jobs #99

@mrceyhun

Description

@mrceyhun

Description

Since our check_utils.sh functions is in place, we can start to apply them in the cron jobs for a solid, bullet-proof success status of crons.

Candidate cron jobs for first pitch

We may start with the cron jobs running in vocms092. Here is the static cron definitions of them which is not exist in any repository:

0 */4 * * * /data/cms/CMSSpark/bin/cron4aggregation
0 */3 * * * /data/cms/CMSSpark/bin/cron4dbs_condor
0 20 * * * /data/cms/CMSSpark/bin/cron4dbs_condor_df /data/cms/pop-data
0 18 * * * /data/cms/CMSSpark/bin/cron4dbs_events /data/cms/pop-data
0 15 */5 * * /data/cms/CMSEOS/CMSSpark/bin/backfill_dbs_condor.sh 1>/data/cms/CMSEOS/CMSSpark/log/backfill.log 2>&1
1 1 * * * /data/cms/cmsmonitoringbackup/run.sh 2>&1 1>& /data/cms/cmsmonitoringbackup/log
07 08 * * * /data/cms/CMSSpark/bin/cron4rucio_daily.sh /cms/rucio_daily

How

Each cron job has its own definitions, output directory and output format. Most of the mentioned candidate cron jobs write output to HDFS, so we can use check_hdfs function to check their success.

Let's give an example:
CMSSpark/bin/cron4rucio_daily.sh write output to /cms/rucio_daily/rucio/2022/08/01 hdfs directory so output format is /cms/rucio_daily/rucio/YYYY/MM/DD which is defined in its Python code. We only know "$HDFS_OUTPUT_DIR"given as /cms/rucio_daily and we need to produce /cms/rucio_daily/rucio/YYYY/MM/DD from the variable.

Example check code for CMSSpark/bin/cron4rucio_daily.sh:

......
/bin/bash "$SCRIPT_DIR/run_rucio_daily.sh" --verbose --output_folder "$HDFS_OUTPUT_DIR" --fdate "$CURRENT_DATE"

#  [It can be good to put nice comment line to separate check commands from the actual cron job, an example:]
# ----- CRON SUCCESS CHECK -----
. ./utils/check_utils.sh
# This cron job runs each day and threshold should be at max 12 hours, so 43200
# Let's check the current output sizes: hadoop fs -du -h /cms/rucio_daily/rucio/2022/08
# So, in average directory size is 80MB, so we can give 50Mb, in bytes 50000000 

check_hdfs "$HDFS_OUTPUT_DIR"/rucio/YYYY/MM/DD 43200 50000000
# !!ATTENTION!! no command should be run after this point

After check function, we should not run any command to not overwrite actual exit code of the check function.

In our tests, we can provide $HDFS_OUTPUT_DIR (/cms/rucio_daily) as some personal tmp HDFS directory like /tmp/username/rucio_daily.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions