Reimpliment suicide trigger as expire trigger by hjoliver · Pull Request #6835 · cylc/cylc-flow

hjoliver · 2025-07-01T05:00:25Z

This simplifies the system a bit by unifying suicide triggers with task expiry.

Also fixes a long-standing bug: tasks removed by suicide triggers can respawn by other dependencies.

Rationale explained at #6813 (comment). Summary:

On master, suicide triggers just delete tasks from the active pool (n=0)

(the bug is: no record of the removal, so suicided tasks can respawn by other dependencies)

On this branch, suicide triggers just expire tasks

expired tasks leave the active pool immediately (which is exactly what suicide is supposed to do)
but they're recorded as expired, which stops them respawning

every existing suicide test passes as-is

The only functional change is: if a suicide-triggered task also has an :expired trigger, that will now run on suicide - that is (a) very niche; and (b) actually sensible

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
Changelog entry included if this is a change that can affect users
Cylc-Doc issue
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

oliver-sanders · 2025-07-01T08:26:09Z

Interesting idea, I like it.

This is a functional change which could potentially have consequences in the nice situation where the same task is clock-expired and suicide-triggered. I don't think this is a problem, but we shouldn’t sneak it into a bugfix release.

hjoliver · 2025-07-01T08:41:42Z

I sort of commented on functional vs bug-fix above. I don't mind either way. I'll rebase on to master (actually where I originally did it). It certainly isn't just a bug fix if we document the conceptual difference straight off.

Functionally I don't think it makes any difference except where a task is both suicide triggered and clock-expired (as you note) with something triggered off the "expired" output. It's unlikely that anyone has ever done that IMO - but I could be wrong. In that case, the downstream "expire branch" will run if the task suicides OR if it clock-expires. That's probably fine, because the expire branch presumably is there to signify in some way the that expired task did not run because it does not need to run ... which is implied by both suicide and expiry.

hjoliver · 2025-07-01T23:12:24Z

Rebased to master, for 8.5

Extended an integration test to check for task expiry and avoidance of respawning.

cylc/flow/task_pool.py

hjoliver · 2025-07-15T22:35:23Z

Bumping back the milestone as per offline discussion - not on the critical path to 8.5.

I expect this to be bumped back again to 8.7, due to the urgent nature of nature of changes planned for 8.6.

However, we should squeeze this in to 8.6 if time allows. Reasons:

it's a very small code change, easy to review
it fixes a real bug referenced in an in-code TODO comment that has repeatedly been queried in recent code reviews
it's a conceptual more than functional change, which simplifies Cylc
to the extent that it is functional (see summary above) the logic is simple enough to be confident that there will be no unintended effects

oliver-sanders · 2025-08-21T10:04:45Z

As discussed on Element, time pressures are a great concern due to the tight timeframes we're working with at the moment. But additionally, we're especially concerned by potential risk of behaviour changes at this particular moment in time which was the big reason for pushing for 8.7.0.

When behaviour changes of this kind are introduced to other tools, they are often released as "opt-in" experimental features to allow the community to evaluate them before rolling out on-mass. If no issues are found, the "experimental" behaviour then becomes the default in a subsequent release. Helps reduce teething pains often experienced in such situations as well as raising awareness of upcoming changes.

I've put together a side PR which puts this change behind an "experimental" flag. This way you can push ahead with it on the 8.6.0 timescale (desired for reasons outlined above) and we can derisk our operational implementation (desired for reasons outlined on the PR): hjoliver#70

I think this is a good approach in general. Where possible (and of course it isn't possible for all changes), I think that this would be a good way to preview proposed changes to the community going forward. I have added an "all" option to allow users to easily opt-in to all experimental features and will encourage the owners of test workflows to use this (as they are excellent canary testers).

This also avoids the need to strip out all of the remaining "suicide" code in the project "right now". This is good as we haven't figured out how we want to handle the downstream implications of this feature yet (e.g. cylc graph, cylc/cylc-ui#1239). We might want to look at renaming / reusing the existing "suicide" field in the graph code going forward?

oliver-sanders · 2025-08-21T10:21:44Z

Note: I think this PR currently has a buggy interaction with the completion condition configuration as the logic added to make expiry optional wont apply if a completion condition is specified. I think this will force a validation error.

See the docs I added in the side PR for context.

hjoliver · 2025-08-24T20:44:03Z

I'm confident this change won't cause any real problems, but I think your "experimental" flag is a good idea in general, @oliver-sanders, so I'm keen to go forward with it.

(Also, I just saw your ops de-risking comment on the side PR - all good, makes perfect sense).

hjoliver · 2025-08-27T21:38:32Z

(Merged the side branch to make this an opt-in "experimental" feature.)
(And rebased).

* Allow experimental features to be previewed in a low-risk fashion. * Initially, add support for the "suicide trigger" -> "expire trigger" change.

Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>

hjoliver · 2025-08-28T04:42:04Z

@oliver-sanders - sounds like you're happy to get this into 8.6 now that it's an opt-in "experimental" feature.

This has implications for documentation though.

Once it's no longer experimental, we need to replace "suicide trigger" with "expire trigger" throughout the documentation.

For now, I guess a new "Experimental Features" section might be useful, to highlight new features that users can choose to try out - what do you think?

The obvious place is perhaps between Changes and Configuration, under Reference. But perhaps that's too "hidden", so very few users will notice and bother to read it 🤔

oliver-sanders · 2025-08-28T08:20:46Z

For now, I guess a new "Experimental Features" section might be useful

The [scheduler][experimental] section will get auto documented in the workflow config section.

We can link to this from the changes entry when we announce this new feature.

oliver-sanders · 2025-08-28T08:29:10Z

cylc/flow/cfgspec/workflow.py

+                * The triggered task's
+                  `flow.cylc[runtime][<namespace>]completion condition`
+                  will be automatically modified so that expiry completes the
+                  task's outputs.


Note: I think this PR currently has a buggy interaction with the completion condition configuration as the logic added to make expiry optional wont apply if a completion condition is specified.

I've just tried this out with the following example:

[scheduler] [[experimental]] all = True [scheduling] [[graph]] R1 = """ a? => b a:fail? => !a """ [runtime] [[a]] completion = succeeded or failed [[b]]

The result is:

WorkflowConfigError: a:expired is permitted in the graph but is not referenced in the completion expression (so is not permitted by it). Try: completion = "succeeded or failed or expired"

Which is reasonable, the discrepancy comes out as a validation error rather than a runtime stall which is good.

I think this is ok, although, we should probably clarify the WorkflowConfigError message for this case to make it clear that the "expire trigger" is the cause of this (as the expired output is unlikely to appear in the graph).

Also this bullet point will need to be updated (sorry made an invalid assumption here):

Suggested change

* The triggered task's

`flow.cylc[runtime][<namespace>]completion condition`

will be automatically modified so that expiry completes the

task's outputs.

* The ``expired`` output will be marked as

:term:`optional` for the triggered task.

hjoliver · 2025-09-01T00:11:35Z

The [scheduler][experimental] section will get auto documented in the workflow config section.

We can link to this from the changes entry when we announce this new feature.

That's fine, but we should actively promote testing of experimental features as being in users' interests. Otherwise they'll all ignore it until it ceases to be experimental and they have no choice, which entirely defeats the purpose of doing it this way. Best done by the forum or locally, I guess.

hjoliver · 2025-09-01T02:18:26Z

The final commits add a warning about probable cause of the completion condition error.

$ cylc val .
INFO - 1 suicide trigger(s) detected. These are rarely needed in Cylc 8 - see https://cylc.github.io/cylc-doc/stable/html/7-to-8/major-changes/suicide-triggers.html
WorkflowConfigError: a:expired is permitted in the graph but is not referenced in the completion.
This may be due to use of an expire (formerly suicide) trigger.  # <------! 
Try: completion = "succeeded or failed or expired"

oliver-sanders · 2025-09-01T08:41:46Z

That's fine, but we should actively promote testing of experimental features as being in users' interests. Otherwise they'll all ignore it until it ceases to be experimental and they have no choice, which entirely defeats the purpose of doing it this way. Best done by the forum or locally, I guess.

We can use the forum and cylc-doc changelog entries. We should also consider a CLI flag to assist with one-off testing.

We could, potentially, consider logging that the behaviour is going to change in the future (for affected workflows), but would need to be careful not to make that too encumberment (especially as warnings result in orange triangles illuminating in the GUI.

changes.d/6835.feat.md

dwsutherland

Makes sense to me 👍 .. Just the one comment about also making experimental a global option (if it isn't already inherited)

dwsutherland · 2025-09-05T04:42:58Z

cylc/flow/cfgspec/workflow.py

               The default time zone is now ``Z`` instead of the local time of
               the first workflow start.
        ''')
+        with Conf('experimental', desc='''


Think we can have this as a global option too? That way we don't have to alter all our workflows twice (once to test, and once to undo when it goes live)

Experimental features are for canary testing. I'm not sure it's a good idea to turn on experimental features by default, that way users don't know what features they are opting into so aren't going to be attentive to any issues caused by them.

I did think we should add a CLI flag though to avoid the need to modify the workflows at all.

global configs can be user specific ~/.cylc/flow/8/global.cylc (as we do with out oper/test role users)

Fair point - a user with a lot of workflows might well want to opt in for all of them at once.

We could just recommend against doing it centrally in site config.

I also wouldn't recommend site wide.

Even if I implement it local to my user, I would have to stop and start (or reload?) my workflows for them to pick it up (I believe).
It would save me CLI or workflow mods to do it this way..

Follow on PRs I suppose

cylc reload -g, --global # also reload global configuration

(Since 8.5.0 I think)

oliver-sanders · 2025-09-05T12:42:39Z

I put together a workflow that implements the documented use patterns, running them in and out of experimental mode. Comparing the reflogs, all examples run the same both ways 👍

./flow.cylc

#!Jinja2

{% from "cylc.flow" import LOG %}

[scheduler]
  allow implicit tasks = True
{% if experimental %}
  {% do LOG.warning('[scheduler][experimental]all = True') %}
  [[experimental]]
    all = True
{% endif %}

[scheduling]
  cycling mode = integer
  initial cycle point = 1
  final cycle point = 2
  runahead limit = P0
  [[xtriggers]]
    fail1 = myxtrig(%(point)s, 1):PT1S
    fail2 = myxtrig(%(point)s, 2):PT1S
  [[graph]]
    P1 = """
{% if pattern == 'basic' %}
      a? => b
      a:fail? => r
      a? => !r
      b | r => c
{% elif pattern == 'flaky' %}
      a? => b? => c?
{% elif pattern == 'multiple' %}
      a? & b? => c
      a? | b? => !c
{% elif pattern == 'xtrigger' %}
      @fail1 => x => !y
      @fail2 => y => !x
{% endif %}
    """

[runtime]
  [[a]]
    script = [[ $CYLC_TASK_CYCLE_POINT == 1 ]]

./lib/python/myxtrig.py

def myxtrig(current_cycle, fail_at_cycle):
    return (int(current_cycle) != fail_at_cycle, {})

./run

#!/usr/bin/env bash

set -euo pipefail

exp_id () {
  pattern="$1"
  experimental="$2"
  local name="trig/$pattern"
  if [[ $experimental == True ]]; then
    name="$name/expire"
  else
    name="$name/suicide"
  fi
  echo "$name"
}

for pattern in basic flaky multiple xtrigger; do
  ids=()
  for experimental in True False; do
    id="$(exp_id "$pattern" "$experimental")"
    ids+=("$id")
    cylc vip . \
      -N \
      --no-run-name \
      --reference-log \
      -n "$id" \
      -s "pattern='$pattern'" \
      -s "experimental=$experimental"
  done
done

for pattern in basic flaky multiple xtrigger; do
  ids=()
  for experimental in True False; do
    id="$(exp_id "$pattern" "$experimental")"
    ids+=("$HOME/cylc-run/$id/reference.log")
  done
  echo diff "${ids[@]}"
  diff "${ids[@]}"
  echo -e '\n\n\n'
done

oliver-sanders · 2025-09-05T12:46:31Z

And used this test to check the new behaviour:

[scheduler]
  allow implicit tasks = True
  [[experimental]]
    all = True

[scheduling]
  initial cycle point = 1
  final cycle point = 2
  cycling mode = integer
  runahead limit = P0
  [[graph]]
      P1 = """
        a? => b
        a:fail? => !b
        b | b:expired? => c
      """

[runtime]
  [[a]]
    script = [[ $CYLC_TASK_CYCLE_POINT == 1 ]]

Works as expected 👍.

hjoliver added this to the 8.4.x milestone Jul 1, 2025

hjoliver self-assigned this Jul 1, 2025

hjoliver added the bug Something is wrong :( label Jul 1, 2025

hjoliver force-pushed the suicide-expire branch from 274e2df to afc054b Compare July 1, 2025 23:11

hjoliver changed the base branch from 8.4.x to master July 1, 2025 23:11

hjoliver modified the milestones: 8.4.x, 8.5.0 Jul 1, 2025

hjoliver force-pushed the suicide-expire branch 3 times, most recently from aa40578 to b85c875 Compare July 2, 2025 01:52

hjoliver commented Jul 2, 2025

View reviewed changes

cylc/flow/task_pool.py Outdated Show resolved Hide resolved

hjoliver requested a review from oliver-sanders July 2, 2025 02:01

This was referenced Jul 2, 2025

Redocument suicide triggers as expire triggers cylc/cylc-doc#841

Open

Group trigger #6395

Merged

hjoliver modified the milestones: 8.5.0, 8.7.0, 8.6.0 Jul 15, 2025

hjoliver added 4 commits August 28, 2025 15:45

Reimplement suicide as expire.

12dc748

Make expire optional for suicide-triggered tasks.

b58a630

Extend an integration test.

c4d6db6

Update change log.

4658eac

hjoliver and others added 4 commits August 28, 2025 15:46

Tweak log level conditional.

99de001

config: add experimental configuration section

cbfb7be

* Allow experimental features to be previewed in a low-risk fashion. * Initially, add support for the "suicide trigger" -> "expire trigger" change.

Apply suggestions from code review

531994b

Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>

Style fix.

56ed221

hjoliver force-pushed the suicide-expire branch from c147f17 to 56ed221 Compare August 28, 2025 03:56

hjoliver requested a review from dwsutherland August 28, 2025 04:42

oliver-sanders reviewed Aug 28, 2025

View reviewed changes

hjoliver added 3 commits September 1, 2025 13:38

Give reason for expire trigger completion warning.

938fb6b

Expire trigger doc tweak.

ddb2410

Extend an integration test.

8ed68e7

oliver-sanders reviewed Sep 1, 2025

View reviewed changes

changes.d/6835.feat.md Outdated Show resolved Hide resolved

hjoliver added 2 commits September 2, 2025 09:37

Update the change log. [skip ci]

f4db58f

Tweak change log entry. [skip ci]

4a6572d

dwsutherland approved these changes Sep 5, 2025

View reviewed changes

oliver-sanders approved these changes Sep 5, 2025

View reviewed changes

oliver-sanders merged commit 7a58152 into cylc:master Sep 5, 2025

hjoliver deleted the suicide-expire branch September 8, 2025 23:27

hjoliver mentioned this pull request Oct 21, 2025

Avoid spurious logging on restart with waiting parentless tasks. #7051

Closed

8 tasks

Conversation

hjoliver commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hjoliver commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Aug 21, 2025

Uh oh!

oliver-sanders commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Aug 28, 2025

Uh oh!

oliver-sanders Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

hjoliver commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Sep 1, 2025

Uh oh!

oliver-sanders commented Sep 1, 2025

Uh oh!

Uh oh!

dwsutherland left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dwsutherland Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

oliver-sanders Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

dwsutherland Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hjoliver Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

dwsutherland Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hjoliver Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

hjoliver Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

oliver-sanders commented Sep 5, 2025

./flow.cylc

./lib/python/myxtrig.py

./run

Uh oh!

oliver-sanders commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

hjoliver commented Jul 1, 2025 •

edited

Loading

oliver-sanders commented Jul 1, 2025 •

edited

Loading

hjoliver commented Jul 1, 2025 •

edited

Loading

hjoliver commented Jul 1, 2025 •

edited

Loading

hjoliver commented Jul 15, 2025 •

edited

Loading

oliver-sanders commented Aug 21, 2025 •

edited

Loading

hjoliver commented Aug 24, 2025 •

edited

Loading

hjoliver commented Aug 27, 2025 •

edited

Loading

hjoliver commented Aug 28, 2025 •

edited

Loading

hjoliver commented Sep 1, 2025 •

edited

Loading

dwsutherland left a comment •

edited

Loading

dwsutherland Sep 8, 2025 •

edited

Loading

dwsutherland Sep 8, 2025 •

edited

Loading