Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions changes.d/6623.fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Auto restart: The "force condemn" option (that tells workflows running on a
server to shutdown as opposed to migrate) hasn't worked with the host-selection
mechanism since Cylc 8.0.0. This has now been fixed and the "force condemn"
option has been restored in the documentation.
Comment on lines +1 to +4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Auto restart: The "force condemn" option (that tells workflows running on a
server to shutdown as opposed to migrate) hasn't worked with the host-selection
mechanism since Cylc 8.0.0. This has now been fixed and the "force condemn"
option has been restored in the documentation.
Restored the option to shut down rather than auto-migrate workflows on a condemned
host. This option hasn't worked with the host-selection mechanism since Cylc 8.0.0.

Copy link
Copy Markdown
Member Author

@oliver-sanders oliver-sanders Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually correct. The mode has not been restored, a bug in the host-selection mechanism has been fixed.

48 changes: 43 additions & 5 deletions cylc/flow/cfgspec/globalcfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -826,16 +826,54 @@ def default_for(
range.
''')
Conf('condemned', VDR.V_ABSOLUTE_HOST_LIST, desc=f'''
These hosts will not be used to run jobs.
List run hosts that workflows should *not* run on.

If workflows are already running on
condemned hosts, Cylc will shut them down and
restart them on different hosts.
These will be subtracted from the
`available <global.cylc[scheduler][run hosts]>`:

* Workflows will not start on condemned hosts.
* Workflows that are running on condemned hosts will attempt
to migrate to an available host (providing the
Comment on lines +835 to +836
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Workflows that are running on condemned hosts will attempt
to migrate to an available host (providing the
* By default, workflows running on condemned hosts will attempt
to migrate to an available host (providing the

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous in combo with next paragraph.

`auto restart
<global.cylc[scheduler][main loop][auto restart]>`
plugin is enabled).

This feature can be used to drain a host for patching, or
remove a host that is surplus to requirements.

If a hostname listed here is followed by a ``!`` character
("force mode"), workflows running on it
will shutdown rather than attempting to
migrate (providing the
Comment on lines +844 to +847
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a hostname listed here is followed by a ``!`` character
("force mode"), workflows running on it
will shutdown rather than attempting to
migrate (providing the
If a hostname listed here is followed by a ``!`` character
workflows running on it will be told to shut down
rather than migrate to another host (providing the

`auto restart
<global.cylc[scheduler][main loop][auto restart]>` plugin
is enabled).

.. rubric:: Example:

.. code-block:: cylc

[scheduler]
[[run hosts]]
# there are three hosts in the "pool"
available = host1, host2, host3

# however two have been taken out:
# * workflows running on "host1" will attempt to
# restart on "host3"
# * workflows running on "host2" will shutdown
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# * workflows running on "host2" will shutdown
# * workflows running on "host2" will shut down

condemned = host1, host2!

.. seealso::

:ref:`auto-stop-restart`

.. versionchanged:: 8.4.2

The force-condemn ("!") option caused issues at workflow
startup for Cylc versions between 8.0.0 and 8.4.1
inclusive.
Comment on lines +873 to +875
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The force-condemn ("!") option caused issues at workflow
startup for Cylc versions between 8.0.0 and 8.4.1
inclusive.
The option to shut down ("hostname!") rather than migrate
("hostname") workflows on condemned hosts caused problems
at workflow startup for Cylc versions 8.0.0 through 8.4.1.


.. versionchanged:: 8.0.0

{REPLACES}``[suite servers]condemned hosts``.
Expand Down Expand Up @@ -1336,7 +1374,7 @@ def default_for(
The means by which task progress messages are reported back to
the running workflow.

..rubric:: Options:
.. rubric:: Options:

zmq
Direct client-server TCP communication via network ports
Expand Down
11 changes: 8 additions & 3 deletions cylc/flow/host_select.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,13 @@
# be returned with the up-to-date configuration.
global_config = glbl_cfg(cached=cached)

# condemned hosts may be suffixed with an "!" to activate "force mode"
blacklist = []
for host in global_config.get(['scheduler', 'run hosts', 'condemned'], []):
if host.endswith('!'):
host = host[:-1]

Check warning on line 135 in cylc/flow/host_select.py

View check run for this annotation

Codecov / codecov/patch

cylc/flow/host_select.py#L135

Added line #L135 was not covered by tests
Copy link
Copy Markdown
Member Author

@oliver-sanders oliver-sanders Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this line is covered by the test amended in this PR (confirm by running it on master), however, that test is not run in CI due to it's shared filesystem setup.

blacklist.append(host)

return select_host(
# list of workflow hosts
global_config.get([
Expand All @@ -138,9 +145,7 @@
'scheduler', 'run hosts', 'ranking'
]),
# list of condemned hosts
blacklist=global_config.get(
['scheduler', 'run hosts', 'condemned']
),
blacklist=blacklist,
blacklist_name='condemned host'
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,10 @@ create_test_global_config '' "
${BASE_GLOBAL_CONFIG}
[scheduler]
[[run hosts]]
available = ${CYLC_TEST_HOST_1}
available = ${CYLC_TEST_HOST_1}, ${CYLC_TEST_HOST_2}
# ensure the workflow can start if a host is force-condemned
# see #6623
Comment on lines +54 to +55
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ensure the workflow can start if a host is force-condemned
# see #6623
# ensure the workflow can start if previously shut down on a
# condemned host. See #6623

condemned = ${CYLC_TEST_HOST_2}!
"

set_test_number 8
Expand Down
Loading