Fix spawn order when group triggering tasks before the start cycle point by MetRonnie · Pull Request #7101 · cylc/cylc-flow

MetRonnie · 2025-11-26T13:14:08Z

Following a warm start, group triggering tasks that exist in the initial cycle point only, such as so-called "install cold" tasks, has a bug where the prerequisites within the group are not obeyed - they end up force-satisfied and the whole group submits at the same time.

Repro

[scheduler]
    allow implicit tasks = True

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    runahead limit = P2
    [[graph]]
        R1 = herring => cold1 => cold2 => foo
        P1 = foo[-P1] => foo

[runtime]
    [[COLD]]
    [[cold1, cold2]]
        inherit = COLD

$ cylc play wflow --startcp 5
$ cylc trigger wflow//^/COLD

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
No dependency changes
Tests are included
Changelog entry included if this is a change that can affect users
No docs needed
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

MetRonnie · 2025-12-01T17:44:12Z

Kicking tests

cylc/flow/commands.py

oliver-sanders · 2025-12-02T13:04:49Z

(coverage failing on uncovered repr methods)

tests/integration/test_force_trigger.py

cylc/flow/task_trigger.py

They were not obeying prerequisites within the group

Partially reverts e6c4adf

MetRonnie · 2025-12-05T12:18:55Z

(coverage failing on uncovered repr methods)

Added doctests

oliver-sanders

This fixes the issue where a group of tasks triggered before the start cycle point does not run in order.

I have fond that the workflow will "flow on" from the triggered tasks in a strange way.

E.G, take this workflow:

[scheduler]
    allow implicit tasks = True
[scheduling]
    initial cycle point = 1
    cycling mode = integer
    [[graph]]
        R1 = cold
        P1 = """
            cold[^] => start
            start => run => end
            run[-P1] => run
        """

$ cylc play --start-cycle-point=10  # warm start the workflow from cycle 10
$ cylc hold '*'  # hold all active cycles (10+) to reduce noise
$ cylc trigger '^/cold'  # trigger a R1 task

What happens next:

1/cold runs (expected).
1/start, 1/run, 1/end run (not expected).
2/run spawns (not expected).
However, 2/run does not run as it is dependent on 2/start which has not spawned (very strange)
Workflow will go on to enter a runahead stall (not expected).

I think we need to work out how to handle the downstream impacts as part of this work in order to satisfy the use case.

MetRonnie · 2025-12-05T14:10:23Z

The workflow flowing on from the triggered tasks is expected (but problematic) at the moment; I have not found a fix for that yet. However the runahead stall in your example is probably preferential to it not stalling, as it gives a chance to remove the unwanted tasks!

oliver-sanders · 2025-12-05T15:33:46Z

Understood, but unfortunately, I think that we need to come up with a solution to the flow-on problem before the intervention for this use case can be used in anger as it's too caveat-prone without this.

No idea what that solution would be however!

wxtim · 2025-12-08T13:02:53Z

Does this not create the behaviour we require?

$ cylc trigger '^/cold'  --flow=none  # trigger a R1 task

(however we lose the group trigger interdependence if we trigger a set of tasks like this)

MetRonnie · 2025-12-08T13:11:46Z

(however we lose the group trigger interdependence if we trigger a set of tasks like this)

That is the problem. Fortunately only a handful of operational workflows have cold start tasks with prereqs between them at the Met Office

hjoliver · 2025-12-09T02:32:36Z

(however we lose the group trigger interdependence if we trigger a set of tasks like this)

That is the problem. Fortunately only a handful of operational workflows have cold start tasks with prereqs between them at the Met Office

Unfortunately these are near-ubiquitous here at ESNZ, right @dwsutherland ? (Probably because for years now we've had tasks that deploy code into the run-dir from git repos).
Which is one of the main reasons we've wanted this forever:

Isolated graphs (startup, shutdown, ...) #7020
(in fact we commented on this flow-on problem there: "it makes retriggering the startup graph difficult or confusing (will it "flow on" again?)" - the comment probably dates back to when we had to use a new flow for retriggering, but moving the workflow start point forward brings the problem back again)

Otherwise, we've had several discussions in the past about how to stop flows from flowing on:

Possibilities include:

starting a flow with a defined end cycle-point
a second command to tell an existing flow when or where to stop

(These require more understanding of flows from the user).

Maybe a short-cut variation on the flow-end-point idea?

cylc trigger --single-cycle-point=^

oliver-sanders · 2025-12-09T14:01:58Z

@hjoliver, the issue we're discussing here is specific to warm starts only, but isn't strictly specific to R1 tasks (though the use case we're focusing on is).

Generalisation of the problem: The triggering of tasks before the start cycle point in a warm started workflow [1].

Problems with this ATM:

In-group dependencies are ignored (fixed by this PR).
Cylc flows on from the triggered tasks (the remainder of this discussion).

Re-triggering R1 tasks is normally no issue, Cylc does not flow-on because it looks in the DB and discovers that the downstream tasks have already run. This mechanism works just fine for cold starts...

However, with warm starts [1], we delete the workflow database and restart from a specified "start cycle point". Cylc assumes that everything before the start cycle point has succeeded [2] as part of the workflow startup logic, however, this assumption is constrained to the startup logic. When those R1 tasks run, their outputs cause downstreams to run because they do not exist in the database (because we deleted the database, this is a warm start!). Note that with a warm start, there is only one flow (also also that we do not use new flows at the MO), it's not the starting and stopping of flows that's an issue, it's the lack of workflow history (because it's a warm start!).

This is really an internal consistency issue, one part of Cylc (startup logic) is saying that everything before the start cycle point has succeeded, whereas another part of Cylc (pre-spawn check) is saying that they didn't. As a result, the behaviour is unhelpful and defying user expectations.

However, I think we can resolve this issue by patching the pre-spawn check to match/reflect the warm start logic. If a task is before the start cycle point, we would simply assume it to have succeeded in the absence of a DB entry to the contrary. This would make the startup and task pool logic consistent, the resulting behaviours would match the cold-start scenario:

Under this approach:

If you trigger a task before the start cycle point, it will run, but not spawn downstreams.
If you trigger a group of tasks before the start cycle point, they will run in order (with this PR), but not spawn downstreams.
If you remove, set or trigger a task, then an entry would be created, overwriting the defaut "assume this task succeeded" logic.
No new commands, options or semantics required, warm-start behaviours match cold-start ones for default options.

WDYT

Notes:
[1] Warm start meaning, shut down the workflow, delete the DB, start the workflow from a specified start cycle point. This is not an everyday intervention. Warm starts are a mostly just a useful backstop for emergency situations.
[2] By "succeeded" I mean "final completed".

hjoliver · 2025-12-09T20:21:57Z

[UPDATE: read my follow-up before responding to these comments - they're not wrong, but I like the proposed alternative!]

@oliver-sanders - I understand what a warm-start is - I think we just have a different take on the proper generalization.

Note that with a warm start, there is only one flow ..., it's not the starting and stopping of flows that's an issue, it's the lack of workflow history (because it's a warm start!).

Well, given the lack of history to stop the flow (due to deliberate deletion of it!) stopping the flow is the problem! So, this is (or at least can be viewed as) a particular example of the more general need for control over flow termination. (Note I mean "the flow" in a generic sense - it doesn't have to be a new flow).

I think you're really saying that users expect there to be history to stop flow 1 from continuing, even though they deliberately deleted the history! (Or that they shouldn't even have to understand that deleting the DB does that - meh, that's a pretty violent action, I don't think it's unreasonable to have to think about the consequences!).

Also, note that users would not necessarily have to use the low level flow control capability directly (c.f. group trigger vs lower level manual remove, set, and trigger). E.g. for this sort of use case we could provide something like this:

 cylc trigger --no-flow-on <task-ids>

to mean run <task-ids> with a new flow configured to stop after that group.

This is really an internal consistency issue, one part of Cylc (startup logic) is saying that everything before the start cycle point has succeeded, whereas another part of Cylc (pre-spawn check) is saying that they didn't.

I do see this perspective as well, although arguably the real problem is that the startup logic is an imperfect bodge that affects more graph than it should do - it was a convenience, to bootstrap initial inter-cycle triggers so that users don't have to define the real start-up dependencies explicitly, but in addition it wipes out all dependencies back to the start of the graph

hjoliver · 2025-12-09T21:23:59Z

However, I think we can resolve this issue by patching the pre-spawn check to match/reflect the warm start logic. If a task is before the start cycle point, we would simply assume it to have succeeded in the absence of a DB entry to the contrary. This would make the startup and task pool logic consistent, the resulting behaviours would match the cold-start scenario:
WDYT

Nice.

I stand by my comments above - if we had general flow termination capability we could use that to solve this specific problem.

However, I like your suggestion! It makes sense, it's easy to implement, and it solves this specific problem quickly. I guess sometimes a less general solution wins out...

(BTW this is also a means of flow termination, just not a general one in that it is specific to pre-start points: we will assume flow history prior to the start point even if it does not exist in the DB, which will terminate the flow downstream of triggered tasks.)

So, should @MetRonnie do that on this PR, or shall we have a follow-up PR (which had better be released at the same time)?

wxtim · 2025-12-10T11:11:25Z

I think I've implemented @oliver-sanders' suggestion in MetRonnie/cylc-flow@group-trigger-warm...wxtim:cylc:group-trigger-warm, (not suitable for a PR, contains manual test), but the test case workflow only runs cold1. Can I invite @oliver-sanders to check my change and @MetRonnie to check the combination?

oliver-sanders · 2025-12-10T14:30:31Z

@wxtim, yep, that's the right start and it works for triggering a single task, correctly suppressing flow-on.

However, it doesn't work for triggering a group of tasks as all downstreams are considered complete. This is where the flow=None bit comes in. To additionally need to make it look like these tasks exist to cylc remove so that the in-group tasks get removed resulting in flow=None entries in the DB.

wxtim · 2025-12-11T13:11:29Z

Leaving this with @MetRonnie

MetRonnie · 2025-12-12T16:17:18Z

(I have finished working on this PR, it's ready for review. The bug fix for the flow-on problem is on a separate branch)

cylc/flow/task_pool.py

oliver-sanders · 2025-12-18T11:18:57Z

Reviewing this in combination with the flow-on suppression in #7148

MetRonnie added this to the 8.6.2 milestone Nov 26, 2025

MetRonnie requested a review from oliver-sanders November 26, 2025 13:14

MetRonnie self-assigned this Nov 26, 2025

MetRonnie added the bug Something is wrong :( label Nov 26, 2025

MetRonnie force-pushed the group-trigger-warm branch from 26868fa to 61adbce Compare November 26, 2025 17:02

MetRonnie requested a review from wxtim November 28, 2025 16:23

This comment was marked as resolved.

Sign in to view

MetRonnie changed the title ~~Fix group triggering of tasks before the start cycle point~~ Fix spawn order when group triggering of tasks before the start cycle point Dec 1, 2025

MetRonnie changed the title ~~Fix spawn order when group triggering of tasks before the start cycle point~~ Fix spawn order when group triggering tasks before the start cycle point Dec 1, 2025

MetRonnie closed this Dec 1, 2025

MetRonnie reopened this Dec 1, 2025

oliver-sanders reviewed Dec 2, 2025

View reviewed changes

cylc/flow/commands.py Show resolved Hide resolved

wxtim reviewed Dec 3, 2025

View reviewed changes

tests/integration/test_force_trigger.py Outdated Show resolved Hide resolved

wxtim reviewed Dec 3, 2025

View reviewed changes

tests/integration/test_force_trigger.py Outdated Show resolved Hide resolved

wxtim reviewed Dec 3, 2025

View reviewed changes

cylc/flow/task_trigger.py Outdated Show resolved Hide resolved

MetRonnie added 6 commits December 5, 2025 12:17

Add __repr__ methods for easier debugging

0d4d84f

Fix group triggering of tasks before the startcp

06b6965

They were not obeying prerequisites within the group

Changelog

b5fb37f

Fix missing log warning for failed attempts at removing tasks

e755ec9

Partially reverts e6c4adf

Do not spawn to RH limit after triggering pre-startcp task in warm start

fd46fa3

Fix pytest discovery error

d4c8263

MetRonnie force-pushed the group-trigger-warm branch from 74fa554 to d4c8263 Compare December 5, 2025 12:17

oliver-sanders reviewed Dec 5, 2025

View reviewed changes

MetRonnie mentioned this pull request Dec 5, 2025

Logging of unremovable tasks #7124

Open

MetRonnie requested review from oliver-sanders and wxtim December 9, 2025 14:42

hjoliver mentioned this pull request Dec 11, 2025

Group trigger with no flow-on #7140

Open

wxtim reviewed Dec 15, 2025

View reviewed changes

cylc/flow/task_pool.py Show resolved Hide resolved

MetRonnie mentioned this pull request Dec 15, 2025

Prevent flow-on after triggering pre-startcp tasks after a warm start #7148

Merged

8 tasks

wxtim approved these changes Dec 16, 2025

View reviewed changes

oliver-sanders merged commit d4c8263 into cylc:8.6.x Dec 18, 2025
22 of 23 checks passed

MetRonnie deleted the group-trigger-warm branch December 18, 2025 11:53

Conversation

MetRonnie commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Repro

Check List

Uh oh!

This comment was marked as resolved.

MetRonnie commented Dec 1, 2025

Uh oh!

Uh oh!

oliver-sanders commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MetRonnie commented Dec 5, 2025

Uh oh!

oliver-sanders left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MetRonnie commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wxtim commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MetRonnie commented Dec 8, 2025

Uh oh!

hjoliver commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjoliver commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wxtim commented Dec 10, 2025 • edited by MetRonnie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Dec 10, 2025

Uh oh!

wxtim commented Dec 11, 2025

Uh oh!

MetRonnie commented Dec 12, 2025

Uh oh!

Uh oh!

oliver-sanders commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MetRonnie commented Nov 26, 2025 •

edited

Loading

oliver-sanders left a comment •

edited

Loading

MetRonnie commented Dec 5, 2025 •

edited

Loading

oliver-sanders commented Dec 5, 2025 •

edited

Loading

wxtim commented Dec 8, 2025 •

edited

Loading

hjoliver commented Dec 9, 2025 •

edited

Loading

oliver-sanders commented Dec 9, 2025 •

edited

Loading

hjoliver commented Dec 9, 2025 •

edited

Loading

hjoliver commented Dec 9, 2025 •

edited

Loading

wxtim commented Dec 10, 2025 •

edited by MetRonnie

Loading