Reimpliment suicide trigger as expire trigger#6835
Reimpliment suicide trigger as expire trigger#6835oliver-sanders merged 13 commits intocylc:masterfrom
Conversation
|
Interesting idea, I like it. This is a functional change which could potentially have consequences in the nice situation where the same task is clock-expired and suicide-triggered. I don't think this is a problem, but we shouldn’t sneak it into a bugfix release. |
|
I sort of commented on functional vs bug-fix above. I don't mind either way. I'll rebase on to master (actually where I originally did it). It certainly isn't just a bug fix if we document the conceptual difference straight off. Functionally I don't think it makes any difference except where a task is both suicide triggered and clock-expired (as you note) with something triggered off the "expired" output. It's unlikely that anyone has ever done that IMO - but I could be wrong. In that case, the downstream "expire branch" will run if the task suicides OR if it clock-expires. That's probably fine, because the expire branch presumably is there to signify in some way the that expired task did not run because it does not need to run ... which is implied by both suicide and expiry. |
|
Rebased to master, for 8.5 Extended an integration test to check for task expiry and avoidance of respawning. |
aa40578 to
b85c875
Compare
|
Bumping back the milestone as per offline discussion - not on the critical path to 8.5. I expect this to be bumped back again to 8.7, due to the urgent nature of nature of changes planned for 8.6. However, we should squeeze this in to 8.6 if time allows. Reasons:
|
|
As discussed on Element, time pressures are a great concern due to the tight timeframes we're working with at the moment. But additionally, we're especially concerned by potential risk of behaviour changes at this particular moment in time which was the big reason for pushing for 8.7.0. When behaviour changes of this kind are introduced to other tools, they are often released as "opt-in" experimental features to allow the community to evaluate them before rolling out on-mass. If no issues are found, the "experimental" behaviour then becomes the default in a subsequent release. Helps reduce teething pains often experienced in such situations as well as raising awareness of upcoming changes. I've put together a side PR which puts this change behind an "experimental" flag. This way you can push ahead with it on the 8.6.0 timescale (desired for reasons outlined above) and we can derisk our operational implementation (desired for reasons outlined on the PR): hjoliver#70 I think this is a good approach in general. Where possible (and of course it isn't possible for all changes), I think that this would be a good way to preview proposed changes to the community going forward. I have added an "all" option to allow users to easily opt-in to all experimental features and will encourage the owners of test workflows to use this (as they are excellent canary testers). This also avoids the need to strip out all of the remaining "suicide" code in the project "right now". This is good as we haven't figured out how we want to handle the downstream implications of this feature yet (e.g. cylc graph, cylc/cylc-ui#1239). We might want to look at renaming / reusing the existing "suicide" field in the graph code going forward? |
|
Note: I think this PR currently has a buggy interaction with the See the docs I added in the side PR for context. |
|
I'm confident this change won't cause any real problems, but I think your "experimental" flag is a good idea in general, @oliver-sanders, so I'm keen to go forward with it. (Also, I just saw your ops de-risking comment on the side PR - all good, makes perfect sense). |
|
(Merged the side branch to make this an opt-in "experimental" feature.) |
* Allow experimental features to be previewed in a low-risk fashion. * Initially, add support for the "suicide trigger" -> "expire trigger" change.
Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>
c147f17 to
56ed221
Compare
|
@oliver-sanders - sounds like you're happy to get this into 8.6 now that it's an opt-in "experimental" feature. This has implications for documentation though. Once it's no longer experimental, we need to replace "suicide trigger" with "expire trigger" throughout the documentation. For now, I guess a new "Experimental Features" section might be useful, to highlight new features that users can choose to try out - what do you think? The obvious place is perhaps between Changes and Configuration, under Reference. But perhaps that's too "hidden", so very few users will notice and bother to read it 🤔
|
The We can link to this from the changes entry when we announce this new feature. |
cylc/flow/cfgspec/workflow.py
Outdated
| * The triggered task's | ||
| `flow.cylc[runtime][<namespace>]completion condition` | ||
| will be automatically modified so that expiry completes the | ||
| task's outputs. |
There was a problem hiding this comment.
Note: I think this PR currently has a buggy interaction with the completion condition configuration as the logic added to make expiry optional wont apply if a completion condition is specified.
I've just tried this out with the following example:
[scheduler]
[[experimental]]
all = True
[scheduling]
[[graph]]
R1 = """
a? => b
a:fail? => !a
"""
[runtime]
[[a]]
completion = succeeded or failed
[[b]]The result is:
WorkflowConfigError: a:expired is permitted in the graph but is not referenced in the completion expression (so is not permitted by it).
Try: completion = "succeeded or failed or expired"
Which is reasonable, the discrepancy comes out as a validation error rather than a runtime stall which is good.
I think this is ok, although, we should probably clarify the WorkflowConfigError message for this case to make it clear that the "expire trigger" is the cause of this (as the expired output is unlikely to appear in the graph).
Also this bullet point will need to be updated (sorry made an invalid assumption here):
| * The triggered task's | |
| `flow.cylc[runtime][<namespace>]completion condition` | |
| will be automatically modified so that expiry completes the | |
| task's outputs. | |
| * The ``expired`` output will be marked as | |
| :term:`optional` for the triggered task. |
That's fine, but we should actively promote testing of experimental features as being in users' interests. Otherwise they'll all ignore it until it ceases to be experimental and they have no choice, which entirely defeats the purpose of doing it this way. Best done by the forum or locally, I guess. |
|
The final commits add a warning about probable cause of the completion condition error. $ cylc val .
INFO - 1 suicide trigger(s) detected. These are rarely needed in Cylc 8 - see https://cylc.github.io/cylc-doc/stable/html/7-to-8/major-changes/suicide-triggers.html
WorkflowConfigError: a:expired is permitted in the graph but is not referenced in the completion.
This may be due to use of an expire (formerly suicide) trigger. # <------!
Try: completion = "succeeded or failed or expired" |
We can use the forum and cylc-doc changelog entries. We should also consider a CLI flag to assist with one-off testing. We could, potentially, consider logging that the behaviour is going to change in the future (for affected workflows), but would need to be careful not to make that too encumberment (especially as warnings result in orange triangles illuminating in the GUI. |
| The default time zone is now ``Z`` instead of the local time of | ||
| the first workflow start. | ||
| ''') | ||
| with Conf('experimental', desc=''' |
There was a problem hiding this comment.
Think we can have this as a global option too? That way we don't have to alter all our workflows twice (once to test, and once to undo when it goes live)
There was a problem hiding this comment.
Experimental features are for canary testing. I'm not sure it's a good idea to turn on experimental features by default, that way users don't know what features they are opting into so aren't going to be attentive to any issues caused by them.
I did think we should add a CLI flag though to avoid the need to modify the workflows at all.
There was a problem hiding this comment.
global configs can be user specific ~/.cylc/flow/8/global.cylc (as we do with out oper/test role users)
There was a problem hiding this comment.
Fair point - a user with a lot of workflows might well want to opt in for all of them at once.
We could just recommend against doing it centrally in site config.
There was a problem hiding this comment.
I also wouldn't recommend site wide.
Even if I implement it local to my user, I would have to stop and start (or reload?) my workflows for them to pick it up (I believe).
It would save me CLI or workflow mods to do it this way..
Follow on PRs I suppose
There was a problem hiding this comment.
cylc reload -g, --global # also reload global configuration
|
I put together a workflow that implements the documented use patterns, running them in and out of experimental mode. Comparing the reflogs, all examples run the same both ways 👍 ./flow.cylc#!Jinja2
{% from "cylc.flow" import LOG %}
[scheduler]
allow implicit tasks = True
{% if experimental %}
{% do LOG.warning('[scheduler][experimental]all = True') %}
[[experimental]]
all = True
{% endif %}
[scheduling]
cycling mode = integer
initial cycle point = 1
final cycle point = 2
runahead limit = P0
[[xtriggers]]
fail1 = myxtrig(%(point)s, 1):PT1S
fail2 = myxtrig(%(point)s, 2):PT1S
[[graph]]
P1 = """
{% if pattern == 'basic' %}
a? => b
a:fail? => r
a? => !r
b | r => c
{% elif pattern == 'flaky' %}
a? => b? => c?
{% elif pattern == 'multiple' %}
a? & b? => c
a? | b? => !c
{% elif pattern == 'xtrigger' %}
@fail1 => x => !y
@fail2 => y => !x
{% endif %}
"""
[runtime]
[[a]]
script = [[ $CYLC_TASK_CYCLE_POINT == 1 ]]./lib/python/myxtrig.pydef myxtrig(current_cycle, fail_at_cycle):
return (int(current_cycle) != fail_at_cycle, {})./run#!/usr/bin/env bash
set -euo pipefail
exp_id () {
pattern="$1"
experimental="$2"
local name="trig/$pattern"
if [[ $experimental == True ]]; then
name="$name/expire"
else
name="$name/suicide"
fi
echo "$name"
}
for pattern in basic flaky multiple xtrigger; do
ids=()
for experimental in True False; do
id="$(exp_id "$pattern" "$experimental")"
ids+=("$id")
cylc vip . \
-N \
--no-run-name \
--reference-log \
-n "$id" \
-s "pattern='$pattern'" \
-s "experimental=$experimental"
done
done
for pattern in basic flaky multiple xtrigger; do
ids=()
for experimental in True False; do
id="$(exp_id "$pattern" "$experimental")"
ids+=("$HOME/cylc-run/$id/reference.log")
done
echo diff "${ids[@]}"
diff "${ids[@]}"
echo -e '\n\n\n'
done |
|
And used this test to check the new behaviour: [scheduler]
allow implicit tasks = True
[[experimental]]
all = True
[scheduling]
initial cycle point = 1
final cycle point = 2
cycling mode = integer
runahead limit = P0
[[graph]]
P1 = """
a? => b
a:fail? => !b
b | b:expired? => c
"""
[runtime]
[[a]]
script = [[ $CYLC_TASK_CYCLE_POINT == 1 ]]Works as expected 👍. |

Close #6813
This simplifies the system a bit by unifying suicide triggers with task expiry.
Also fixes a long-standing bug: tasks removed by suicide triggers can respawn by other dependencies.
Rationale explained at #6813 (comment). Summary:
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.