Open
Description
The helper scripts (service node, prescript, and postscript) can rapidly churn due to HTCondor / DAGMan lacking any sort of cooloff mechanism.
On failure,
- Ensure we log the exception or failure to stderr.
- Add a random sleep, between 30 and 60 seconds, to the end of the script.
Since we don't keep state of how recently we've failed, we can't do an exponential backoff easily -- will need to do the flat delay.
Metadata
Metadata
Assignees
Labels
No labels