Description
Since our check_utils.sh functions is in place, we can start to apply them in the cron jobs for a solid, bullet-proof success status of crons.
Candidate cron jobs for first pitch
We may start with the cron jobs running in vocms092. Here is the static cron definitions of them which is not exist in any repository:
0 */4 * * * /data/cms/CMSSpark/bin/cron4aggregation
0 */3 * * * /data/cms/CMSSpark/bin/cron4dbs_condor
0 20 * * * /data/cms/CMSSpark/bin/cron4dbs_condor_df /data/cms/pop-data
0 18 * * * /data/cms/CMSSpark/bin/cron4dbs_events /data/cms/pop-data
0 15 */5 * * /data/cms/CMSEOS/CMSSpark/bin/backfill_dbs_condor.sh 1>/data/cms/CMSEOS/CMSSpark/log/backfill.log 2>&1
1 1 * * * /data/cms/cmsmonitoringbackup/run.sh 2>&1 1>& /data/cms/cmsmonitoringbackup/log
07 08 * * * /data/cms/CMSSpark/bin/cron4rucio_daily.sh /cms/rucio_daily
How
Each cron job has its own definitions, output directory and output format. Most of the mentioned candidate cron jobs write output to HDFS, so we can use check_hdfs function to check their success.
Let's give an example:
CMSSpark/bin/cron4rucio_daily.sh write output to /cms/rucio_daily/rucio/2022/08/01 hdfs directory so output format is /cms/rucio_daily/rucio/YYYY/MM/DD which is defined in its Python code. We only know "$HDFS_OUTPUT_DIR"given as /cms/rucio_daily and we need to produce /cms/rucio_daily/rucio/YYYY/MM/DD from the variable.
Example check code for CMSSpark/bin/cron4rucio_daily.sh:
......
/bin/bash "$SCRIPT_DIR/run_rucio_daily.sh" --verbose --output_folder "$HDFS_OUTPUT_DIR" --fdate "$CURRENT_DATE"
# [It can be good to put nice comment line to separate check commands from the actual cron job, an example:]
# ----- CRON SUCCESS CHECK -----
. ./utils/check_utils.sh
# This cron job runs each day and threshold should be at max 12 hours, so 43200
# Let's check the current output sizes: hadoop fs -du -h /cms/rucio_daily/rucio/2022/08
# So, in average directory size is 80MB, so we can give 50Mb, in bytes 50000000
check_hdfs "$HDFS_OUTPUT_DIR"/rucio/YYYY/MM/DD 43200 50000000
# !!ATTENTION!! no command should be run after this point
After check function, we should not run any command to not overwrite actual exit code of the check function.
In our tests, we can provide $HDFS_OUTPUT_DIR (/cms/rucio_daily) as some personal tmp HDFS directory like /tmp/username/rucio_daily.
Description
Since our check_utils.sh functions is in place, we can start to apply them in the cron jobs for a solid, bullet-proof success status of crons.
Candidate cron jobs for first pitch
We may start with the cron jobs running in vocms092. Here is the static cron definitions of them which is not exist in any repository:
How
Each cron job has its own definitions, output directory and output format. Most of the mentioned candidate cron jobs write output to HDFS, so we can use
check_hdfsfunction to check their success.Let's give an example:
CMSSpark/bin/cron4rucio_daily.sh write output to
/cms/rucio_daily/rucio/2022/08/01hdfs directory so output format is/cms/rucio_daily/rucio/YYYY/MM/DDwhich is defined in its Python code. We only know"$HDFS_OUTPUT_DIR"given as/cms/rucio_dailyand we need to produce/cms/rucio_daily/rucio/YYYY/MM/DDfrom the variable.Example check code for CMSSpark/bin/cron4rucio_daily.sh:
After check function, we should not run any command to not overwrite actual exit code of the check function.
In our tests, we can provide
$HDFS_OUTPUT_DIR(/cms/rucio_daily) as some personal tmp HDFS directory like/tmp/username/rucio_daily.