Address play and vr timeouts: ping first#6909
Conversation
|
This is somewhat-arguably not a bug fix; I can retarget to 8.6 if necessary, but it's needed pretty urgently (else I'll need patch each interim version installed at ESNZ, where we're running with this already). |
c76317c to
1a6bb63
Compare
1a6bb63 to
e5ee032
Compare
Also: remove ping task-id: it was broken and not used.
e5ee032 to
464a0ac
Compare
|
(Changed description to not close #6261) |
7935c6d to
b5e5835
Compare
|
@oliver-sanders - any suggestions on how to test this? |
dwsutherland
left a comment
There was a problem hiding this comment.
LGTM 👍 (also used in ESNZ operations)
The most you could do is unit test this, unless there's a way to make a schd/platform artificially non responsive? |
c7d7ce4 to
3a73be7
Compare
|
Actually a good 'ol functional test is the easiest way to do this. I got 100% patch coverage with one new one, Merging with two approvals... |
Close #6846
Partially address #6261 (by documenting exactly what[off-topic, punted to #6354]cylc vrdoes now, rather than changing its behaviour)The
detect_old_contact_filecheck runscylc psutilover ssh on a hard-wired 10 second timeout. With NFS latency issues on HPC, that may not be nearly long enough.This change:
detect_old_contact_fileif that times out (resume: replacedetect_old_contact_filecheck with a "ping" type check #6846)ContactFileExists->SchedulerAlivecylc pingto detect running tasks (as opposed to running workflows)Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.