Skip to content

Latest commit

 

History

History
54 lines (41 loc) · 1.44 KB

File metadata and controls

54 lines (41 loc) · 1.44 KB

Random bash bits

Get time till the next maintenance

Add this to ~/.bashrc This is likely very specific to our Franklin cluster

get_maint_time () {
    maint_start=$(scontrol show -u root reservation \
                | perl -pe 's/\n/#/g' \
                | perl -pe 's/#ReservationName=/\n/g' \
                | grep MAINT \
                | perl -ne '/.+StartTime=(.+)\sEndTime/; print $1;')
    difftime=$(($(date +%s -d "$maint_start") - $(date +%s) - 1000))
    req_len=$(date -d "@$difftime" "+$(($difftime/86400))-%H:%M:%S")
    echo $req_len
}
alias salloc_l='salloc --time=$(get_maint_time)'
alias srun_l='srun --time=$(get_maint_time)'
alias sbatch_l='sbatch --time=$(get_maint_time)'

Change time limit on a job:

scontrol update jobid=4272971 TimeLimit=2-00:00:00

Set high priority for slurm job

srun_l -c 5 -p himem --priority=TOP --x11 --pty bash

Get total number of processes per user

ps -U mvc002 -u mvc002 u | wc -l

Franklin has 300 process limit

See OOM kill events

 dmesg | grep `id -u` | grep oom

Call a script recurrently within another job and kill it if the parent job finishes or errors out

(
    while true; do
        # Some sort of call; typically use this to monitor the sizes of files associated with a job
        sleep 60
    done
) &
MONITOR_PID=$!

trap 'kill ${MONITOR_PID} 2>/dev/null || true; wait ${MONITOR_PID} 2>/dev/null || true' EXIT ERR SIGINT SIGTERM