host-select: fix compatibility with force-condemned hosts#6623
host-select: fix compatibility with force-condemned hosts#6623oliver-sanders wants to merge 3 commits intocylc:8.4.xfrom
Conversation
| blacklist = [] | ||
| for host in global_config.get(['scheduler', 'run hosts', 'condemned'], []): | ||
| if host.endswith('!'): | ||
| host = host[:-1] |
There was a problem hiding this comment.
Note, this line is covered by the test amended in this PR (confirm by running it on master), however, that test is not run in CI due to it's shared filesystem setup.
There was a problem hiding this comment.
Looks fine, just some minor verbosity-reduction suggestions.
A couple of big-picture questions though:
- It's weird that the documentation had been deliberately deleted - does that suggest there was an intention to remove this option??
- Maybe it's bedded in for historical reasons as this point but I'm not sure "force-condemn" is a very good name for this feature. Is it because there are some workflows that auto-migrate won't touch? (long-running background tasks?). But even then, shouldn't force-condemn try to auto-migrate first, then kill any remaining workflows that failed to migrate? Given that it simply kills all the workflows, something like "kill and condemn" would be a better name, to indicate that auto-migration is not even attempted.
I think this more slipped the net than was deliberately deleted.
Happy to change, suggest an alternative.
Some workflows cannot auto-migrate (e.g. workflows started in The idea was to condemn the host first (i.e. ask workflows to migrate nicely), then mop up any stragglers with the "force" mode. |
Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>
OK that's fine. Leave the decision on when to force it up to the humans. We should document that though, as some will assume the options are migrate-all or kill-all, and not think of migrate-if-possible before changing the global config again to kill the rest. We should also document the circumstances under which workflows will not successfully auto-migrate. I think the best place for that is the general "pool of hosts" doc in the user guide, would you agree? |
|
Yes, a cylc-doc PR will follow (see OP), it shouldn't go here in the config docs. |
|
I put up a cylc-doc issue, just to record that we need to document why force-mode is needed in the first place (some workflows will not auto-migrate): cylc/cylc-doc#809 |
|
Regards the name "force condemn" ... A host is "condemned" as soon as it is listed as such, whether or not it still has workflows running on it. The word implies a host shutdown at some future time. (C.f. a condemned man hasn't been hung by the neck just yet; and a condemned building hasn't been demolished yet.) I think we can just say that workflows running on condemned hosts will be told to auto-migrate, unless the hostname is followed by "!" in which case they will be shut down. There's probably no need to call that forced condemnation, right? I'll re-review and provide new code suggestions along those lines ... |
| Auto restart: The "force condemn" option (that tells workflows running on a | ||
| server to shutdown as opposed to migrate) hasn't worked with the host-selection | ||
| mechanism since Cylc 8.0.0. This has now been fixed and the "force condemn" | ||
| option has been restored in the documentation. |
There was a problem hiding this comment.
| Auto restart: The "force condemn" option (that tells workflows running on a | |
| server to shutdown as opposed to migrate) hasn't worked with the host-selection | |
| mechanism since Cylc 8.0.0. This has now been fixed and the "force condemn" | |
| option has been restored in the documentation. | |
| Restored the option to shut down rather than auto-migrate workflows on a condemned | |
| host. This option hasn't worked with the host-selection mechanism since Cylc 8.0.0. |
There was a problem hiding this comment.
This isn't actually correct. The mode has not been restored, a bug in the host-selection mechanism has been fixed.
| * Workflows that are running on condemned hosts will attempt | ||
| to migrate to an available host (providing the |
There was a problem hiding this comment.
| * Workflows that are running on condemned hosts will attempt | |
| to migrate to an available host (providing the | |
| * By default, workflows running on condemned hosts will attempt | |
| to migrate to an available host (providing the |
There was a problem hiding this comment.
Superfluous in combo with next paragraph.
| If a hostname listed here is followed by a ``!`` character | ||
| ("force mode"), workflows running on it | ||
| will shutdown rather than attempting to | ||
| migrate (providing the |
There was a problem hiding this comment.
| If a hostname listed here is followed by a ``!`` character | |
| ("force mode"), workflows running on it | |
| will shutdown rather than attempting to | |
| migrate (providing the | |
| If a hostname listed here is followed by a ``!`` character | |
| workflows running on it will be told to shut down | |
| rather than migrate to another host (providing the |
| # however two have been taken out: | ||
| # * workflows running on "host1" will attempt to | ||
| # restart on "host3" | ||
| # * workflows running on "host2" will shutdown |
There was a problem hiding this comment.
| # * workflows running on "host2" will shutdown | |
| # * workflows running on "host2" will shut down |
| The force-condemn ("!") option caused issues at workflow | ||
| startup for Cylc versions between 8.0.0 and 8.4.1 | ||
| inclusive. |
There was a problem hiding this comment.
| The force-condemn ("!") option caused issues at workflow | |
| startup for Cylc versions between 8.0.0 and 8.4.1 | |
| inclusive. | |
| The option to shut down ("hostname!") rather than migrate | |
| ("hostname") workflows on condemned hosts caused problems | |
| at workflow startup for Cylc versions 8.0.0 through 8.4.1. |
| # ensure the workflow can start if a host is force-condemned | ||
| # see #6623 |
There was a problem hiding this comment.
| # ensure the workflow can start if a host is force-condemned | |
| # see #6623 | |
| # ensure the workflow can start if previously shut down on a | |
| # condemned host. See #6623 |
I think we need a name for this. It would be clunky to cross-reference this from the docs otherwise. Please suggest an alternative to "force". I will keep using "force" as a placeholder, happy to change, but need something to change it to. |
|
I have suggested an alternative. It's not simply a replacement for the word "force" in "force condemned", because - given the meaning of the word "condemned" - that just doesn't make sense. See my response on the cylc-doc issue: cylc/cylc-doc#809 (comment) |
|
Turns out this was already well documented, I just didn't look very hard. I'm going to close this PR and re-raise it with simpler docs pointing at the canonical documentation. |
The "force mode" option in the auto-restart feature hasn't been used for a while.
It still works at Cylc 8, but force-condemning a host causes workflow startup to fail.
This PR:
For info on the "force mode", see the Cylc 7 docs: https://cylc.github.io/cylc-doc/7.9.3/html/running-suites.html#auto-stop-restart
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.