Skip to content

Reimpliment suicide trigger as expire trigger#6835

Merged
oliver-sanders merged 13 commits intocylc:masterfrom
hjoliver:suicide-expire
Sep 5, 2025
Merged

Reimpliment suicide trigger as expire trigger#6835
oliver-sanders merged 13 commits intocylc:masterfrom
hjoliver:suicide-expire

Conversation

@hjoliver
Copy link
Copy Markdown
Member

@hjoliver hjoliver commented Jul 1, 2025

Close #6813

This simplifies the system a bit by unifying suicide triggers with task expiry.

Also fixes a long-standing bug: tasks removed by suicide triggers can respawn by other dependencies.

Rationale explained at #6813 (comment). Summary:

On master, suicide triggers just delete tasks from the active pool (n=0)

  • (the bug is: no record of the removal, so suicided tasks can respawn by other dependencies)

On this branch, suicide triggers just expire tasks

  • expired tasks leave the active pool immediately (which is exactly what suicide is supposed to do)
    but they're recorded as expired, which stops them respawning
  • every existing suicide test passes as-is

The only functional change is: if a suicide-triggered task also has an :expired trigger, that will now run on suicide - that is (a) very niche; and (b) actually sensible

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc issue
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@hjoliver hjoliver added this to the 8.4.x milestone Jul 1, 2025
@hjoliver hjoliver self-assigned this Jul 1, 2025
@hjoliver hjoliver added the bug Something is wrong :( label Jul 1, 2025
@oliver-sanders
Copy link
Copy Markdown
Member

oliver-sanders commented Jul 1, 2025

Interesting idea, I like it.

This is a functional change which could potentially have consequences in the nice situation where the same task is clock-expired and suicide-triggered. I don't think this is a problem, but we shouldn’t sneak it into a bugfix release.

@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Jul 1, 2025

I sort of commented on functional vs bug-fix above. I don't mind either way. I'll rebase on to master (actually where I originally did it). It certainly isn't just a bug fix if we document the conceptual difference straight off.

Functionally I don't think it makes any difference except where a task is both suicide triggered and clock-expired (as you note) with something triggered off the "expired" output. It's unlikely that anyone has ever done that IMO - but I could be wrong. In that case, the downstream "expire branch" will run if the task suicides OR if it clock-expires. That's probably fine, because the expire branch presumably is there to signify in some way the that expired task did not run because it does not need to run ... which is implied by both suicide and expiry.

@hjoliver hjoliver changed the base branch from 8.4.x to master July 1, 2025 23:11
@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Jul 1, 2025

Rebased to master, for 8.5

Extended an integration test to check for task expiry and avoidance of respawning.

@hjoliver hjoliver modified the milestones: 8.4.x, 8.5.0 Jul 1, 2025
@hjoliver hjoliver force-pushed the suicide-expire branch 3 times, most recently from aa40578 to b85c875 Compare July 2, 2025 01:52
@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Jul 15, 2025

Bumping back the milestone as per offline discussion - not on the critical path to 8.5.

I expect this to be bumped back again to 8.7, due to the urgent nature of nature of changes planned for 8.6.

However, we should squeeze this in to 8.6 if time allows. Reasons:

  • it's a very small code change, easy to review
  • it fixes a real bug referenced in an in-code TODO comment that has repeatedly been queried in recent code reviews
  • it's a conceptual more than functional change, which simplifies Cylc
  • to the extent that it is functional (see summary above) the logic is simple enough to be confident that there will be no unintended effects

@hjoliver hjoliver modified the milestones: 8.5.0, 8.7.0, 8.6.0 Jul 15, 2025
@oliver-sanders
Copy link
Copy Markdown
Member

As discussed on Element, time pressures are a great concern due to the tight timeframes we're working with at the moment. But additionally, we're especially concerned by potential risk of behaviour changes at this particular moment in time which was the big reason for pushing for 8.7.0.

When behaviour changes of this kind are introduced to other tools, they are often released as "opt-in" experimental features to allow the community to evaluate them before rolling out on-mass. If no issues are found, the "experimental" behaviour then becomes the default in a subsequent release. Helps reduce teething pains often experienced in such situations as well as raising awareness of upcoming changes.

I've put together a side PR which puts this change behind an "experimental" flag. This way you can push ahead with it on the 8.6.0 timescale (desired for reasons outlined above) and we can derisk our operational implementation (desired for reasons outlined on the PR): hjoliver#70

I think this is a good approach in general. Where possible (and of course it isn't possible for all changes), I think that this would be a good way to preview proposed changes to the community going forward. I have added an "all" option to allow users to easily opt-in to all experimental features and will encourage the owners of test workflows to use this (as they are excellent canary testers).

This also avoids the need to strip out all of the remaining "suicide" code in the project "right now". This is good as we haven't figured out how we want to handle the downstream implications of this feature yet (e.g. cylc graph, cylc/cylc-ui#1239). We might want to look at renaming / reusing the existing "suicide" field in the graph code going forward?

@oliver-sanders
Copy link
Copy Markdown
Member

oliver-sanders commented Aug 21, 2025

Note: I think this PR currently has a buggy interaction with the completion condition configuration as the logic added to make expiry optional wont apply if a completion condition is specified. I think this will force a validation error.

See the docs I added in the side PR for context.

@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Aug 24, 2025

I'm confident this change won't cause any real problems, but I think your "experimental" flag is a good idea in general, @oliver-sanders, so I'm keen to go forward with it.

(Also, I just saw your ops de-risking comment on the side PR - all good, makes perfect sense).

@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Aug 27, 2025

(Merged the side branch to make this an opt-in "experimental" feature.)
(And rebased).

hjoliver and others added 4 commits August 28, 2025 15:46
* Allow experimental features to be previewed in a low-risk fashion.
* Initially, add support for the "suicide trigger" -> "expire trigger"
  change.
Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>
@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Aug 28, 2025

@oliver-sanders - sounds like you're happy to get this into 8.6 now that it's an opt-in "experimental" feature.

This has implications for documentation though.

Once it's no longer experimental, we need to replace "suicide trigger" with "expire trigger" throughout the documentation.

For now, I guess a new "Experimental Features" section might be useful, to highlight new features that users can choose to try out - what do you think?

The obvious place is perhaps between Changes and Configuration, under Reference. But perhaps that's too "hidden", so very few users will notice and bother to read it 🤔

image

@hjoliver hjoliver requested a review from dwsutherland August 28, 2025 04:42
@oliver-sanders
Copy link
Copy Markdown
Member

For now, I guess a new "Experimental Features" section might be useful

The [scheduler][experimental] section will get auto documented in the workflow config section.

We can link to this from the changes entry when we announce this new feature.

Comment on lines +439 to +442
* The triggered task's
`flow.cylc[runtime][<namespace>]completion condition`
will be automatically modified so that expiry completes the
task's outputs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think this PR currently has a buggy interaction with the completion condition configuration as the logic added to make expiry optional wont apply if a completion condition is specified.

I've just tried this out with the following example:

[scheduler]
  [[experimental]]
    all = True

[scheduling]
  [[graph]]
    R1 = """
      a? => b
      a:fail? => !a
    """

[runtime]
  [[a]]
    completion = succeeded or failed
  [[b]]

The result is:

WorkflowConfigError: a:expired is permitted in the graph but is not referenced in the completion expression (so is not permitted by it).
Try: completion = "succeeded or failed or expired"

Which is reasonable, the discrepancy comes out as a validation error rather than a runtime stall which is good.

I think this is ok, although, we should probably clarify the WorkflowConfigError message for this case to make it clear that the "expire trigger" is the cause of this (as the expired output is unlikely to appear in the graph).

Also this bullet point will need to be updated (sorry made an invalid assumption here):

Suggested change
* The triggered task's
`flow.cylc[runtime][<namespace>]completion condition`
will be automatically modified so that expiry completes the
task's outputs.
* The ``expired`` output will be marked as
:term:`optional` for the triggered task.

@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Sep 1, 2025

The [scheduler][experimental] section will get auto documented in the workflow config section.

We can link to this from the changes entry when we announce this new feature.

That's fine, but we should actively promote testing of experimental features as being in users' interests. Otherwise they'll all ignore it until it ceases to be experimental and they have no choice, which entirely defeats the purpose of doing it this way. Best done by the forum or locally, I guess.

@hjoliver
Copy link
Copy Markdown
Member Author

hjoliver commented Sep 1, 2025

The final commits add a warning about probable cause of the completion condition error.

$ cylc val .
INFO - 1 suicide trigger(s) detected. These are rarely needed in Cylc 8 - see https://cylc.github.io/cylc-doc/stable/html/7-to-8/major-changes/suicide-triggers.html
WorkflowConfigError: a:expired is permitted in the graph but is not referenced in the completion.
This may be due to use of an expire (formerly suicide) trigger.  # <------! 
Try: completion = "succeeded or failed or expired"

@oliver-sanders
Copy link
Copy Markdown
Member

That's fine, but we should actively promote testing of experimental features as being in users' interests. Otherwise they'll all ignore it until it ceases to be experimental and they have no choice, which entirely defeats the purpose of doing it this way. Best done by the forum or locally, I guess.

We can use the forum and cylc-doc changelog entries. We should also consider a CLI flag to assist with one-off testing.

We could, potentially, consider logging that the behaviour is going to change in the future (for affected workflows), but would need to be careful not to make that too encumberment (especially as warnings result in orange triangles illuminating in the GUI.

Copy link
Copy Markdown
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me 👍 .. Just the one comment about also making experimental a global option (if it isn't already inherited)

The default time zone is now ``Z`` instead of the local time of
the first workflow start.
''')
with Conf('experimental', desc='''
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we can have this as a global option too? That way we don't have to alter all our workflows twice (once to test, and once to undo when it goes live)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental features are for canary testing. I'm not sure it's a good idea to turn on experimental features by default, that way users don't know what features they are opting into so aren't going to be attentive to any issues caused by them.

I did think we should add a CLI flag though to avoid the need to modify the workflows at all.

Copy link
Copy Markdown
Member

@dwsutherland dwsutherland Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global configs can be user specific ~/.cylc/flow/8/global.cylc (as we do with out oper/test role users)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point - a user with a lot of workflows might well want to opt in for all of them at once.

We could just recommend against doing it centrally in site config.

Copy link
Copy Markdown
Member

@dwsutherland dwsutherland Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wouldn't recommend site wide.

Even if I implement it local to my user, I would have to stop and start (or reload?) my workflows for them to pick it up (I believe).
It would save me CLI or workflow mods to do it this way..

Follow on PRs I suppose

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cylc reload -g, --global # also reload global configuration

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Since 8.5.0 I think)

@oliver-sanders
Copy link
Copy Markdown
Member

I put together a workflow that implements the documented use patterns, running them in and out of experimental mode. Comparing the reflogs, all examples run the same both ways 👍

./flow.cylc

#!Jinja2

{% from "cylc.flow" import LOG %}

[scheduler]
  allow implicit tasks = True
{% if experimental %}
  {% do LOG.warning('[scheduler][experimental]all = True') %}
  [[experimental]]
    all = True
{% endif %}

[scheduling]
  cycling mode = integer
  initial cycle point = 1
  final cycle point = 2
  runahead limit = P0
  [[xtriggers]]
    fail1 = myxtrig(%(point)s, 1):PT1S
    fail2 = myxtrig(%(point)s, 2):PT1S
  [[graph]]
    P1 = """
{% if pattern == 'basic' %}
      a? => b
      a:fail? => r
      a? => !r
      b | r => c
{% elif pattern == 'flaky' %}
      a? => b? => c?
{% elif pattern == 'multiple' %}
      a? & b? => c
      a? | b? => !c
{% elif pattern == 'xtrigger' %}
      @fail1 => x => !y
      @fail2 => y => !x
{% endif %}
    """

[runtime]
  [[a]]
    script = [[ $CYLC_TASK_CYCLE_POINT == 1 ]]

./lib/python/myxtrig.py

def myxtrig(current_cycle, fail_at_cycle):
    return (int(current_cycle) != fail_at_cycle, {})

./run

#!/usr/bin/env bash

set -euo pipefail

exp_id () {
  pattern="$1"
  experimental="$2"
  local name="trig/$pattern"
  if [[ $experimental == True ]]; then
    name="$name/expire"
  else
    name="$name/suicide"
  fi
  echo "$name"
}

for pattern in basic flaky multiple xtrigger; do
  ids=()
  for experimental in True False; do
    id="$(exp_id "$pattern" "$experimental")"
    ids+=("$id")
    cylc vip . \
      -N \
      --no-run-name \
      --reference-log \
      -n "$id" \
      -s "pattern='$pattern'" \
      -s "experimental=$experimental"
  done
done

for pattern in basic flaky multiple xtrigger; do
  ids=()
  for experimental in True False; do
    id="$(exp_id "$pattern" "$experimental")"
    ids+=("$HOME/cylc-run/$id/reference.log")
  done
  echo diff "${ids[@]}"
  diff "${ids[@]}"
  echo -e '\n\n\n'
done

@oliver-sanders
Copy link
Copy Markdown
Member

And used this test to check the new behaviour:

[scheduler]
  allow implicit tasks = True
  [[experimental]]
    all = True

[scheduling]
  initial cycle point = 1
  final cycle point = 2
  cycling mode = integer
  runahead limit = P0
  [[graph]]
      P1 = """
        a? => b
        a:fail? => !b
        b | b:expired? => c
      """

[runtime]
  [[a]]
    script = [[ $CYLC_TASK_CYCLE_POINT == 1 ]]

Works as expected 👍.

@oliver-sanders oliver-sanders merged commit 7a58152 into cylc:master Sep 5, 2025
@hjoliver hjoliver deleted the suicide-expire branch September 8, 2025 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is wrong :(

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TODO: this logic fails if task removed after some outputs completed

3 participants