Skip to content

Address play and vr timeouts: ping first#6909

Merged
hjoliver merged 9 commits intocylc:8.5.xfrom
hjoliver:cylc-play-vr-ping
Aug 26, 2025
Merged

Address play and vr timeouts: ping first#6909
hjoliver merged 9 commits intocylc:8.5.xfrom
hjoliver:cylc-play-vr-ping

Conversation

@hjoliver
Copy link
Copy Markdown
Member

@hjoliver hjoliver commented Aug 11, 2025

Close #6846
Partially address #6261 (by documenting exactly what cylc vr does now, rather than changing its behaviour) [off-topic, punted to #6354]

We currently perform the detect_old_contact_file check before we resume a workflow (i.e. cylc play) and additionally forcylc vr.
With all other commands, we only perform this check if a network timeout has occurred. But in these situations we just do it anyway.
I don't think there is any good reason to perform these checks, we should be able to replace it with a simple cylc ping type check

The detect_old_contact_file check runs cylc psutil over ssh on a hard-wired 10 second timeout. With NFS latency issues on HPC, that may not be nearly long enough.

This change:

  • do a (typically much faster) ping type check first, and only resort to detect_old_contact_file if that times out (resume: replace detect_old_contact_file check with a "ping" type check #6846)
  • make the timeout configurable - it's still needed for orphaned contact files, and 10 sec ain't enough
  • fix badly named exception ContactFileExists -> SchedulerAlive
    • (it is not raised if the contact file exists but the scheduler is not alive)
  • (somewhat relatedly) remove the ability of cylc ping to detect running tasks (as opposed to running workflows)
    • this very old feature was untested and broken, so I can conclude no one was using it

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@hjoliver hjoliver added this to the 8.5.x milestone Aug 11, 2025
@hjoliver hjoliver self-assigned this Aug 11, 2025
@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Aug 11, 2025

This is somewhat-arguably not a bug fix; I can retarget to 8.6 if necessary, but it's needed pretty urgently (else I'll need patch each interim version installed at ESNZ, where we're running with this already).

@hjoliver hjoliver marked this pull request as draft August 11, 2025 02:17
@hjoliver hjoliver force-pushed the cylc-play-vr-ping branch 2 times, most recently from c76317c to 1a6bb63 Compare August 11, 2025 04:36
@hjoliver hjoliver marked this pull request as ready for review August 14, 2025 05:36
@hjoliver
Copy link
Copy Markdown
Member Author

(Changed description to not close #6261)

@hjoliver
Copy link
Copy Markdown
Member Author

@oliver-sanders - any suggestions on how to test this?

Copy link
Copy Markdown
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 (also used in ESNZ operations)

@dwsutherland
Copy link
Copy Markdown
Member

@oliver-sanders - any suggestions on how to test this?

The most you could do is unit test this, unless there's a way to make a schd/platform artificially non responsive?

@hjoliver
Copy link
Copy Markdown
Member Author

Actually a good 'ol functional test is the easiest way to do this.

I got 100% patch coverage with one new one, tests/f/cylc-play/00-contact.t - check it out if you like, but probably no need (it passes, it's quick, it's only a test).

Merging with two approvals...

@hjoliver hjoliver merged commit 9352f61 into cylc:8.5.x Aug 26, 2025
37 of 38 checks passed
@hjoliver hjoliver deleted the cylc-play-vr-ping branch August 26, 2025 01:37
@hjoliver hjoliver modified the milestones: 8.5.x, 8.5.2 Aug 26, 2025
@MetRonnie MetRonnie added the config change Involves a change to global or workflow config label Aug 26, 2025
@MetRonnie MetRonnie mentioned this pull request Aug 26, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config change Involves a change to global or workflow config

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants