Skip to content

Add delay to scripts on failure #2

Open
@bbockelm

Description

@bbockelm

The helper scripts (service node, prescript, and postscript) can rapidly churn due to HTCondor / DAGMan lacking any sort of cooloff mechanism.

On failure,

  1. Ensure we log the exception or failure to stderr.
  2. Add a random sleep, between 30 and 60 seconds, to the end of the script.

Since we don't keep state of how recently we've failed, we can't do an exponential backoff easily -- will need to do the flat delay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions