Skip to content

Fix spawn order when group triggering tasks before the start cycle point#7101

Merged
oliver-sanders merged 6 commits intocylc:8.6.xfrom
MetRonnie:group-trigger-warm
Dec 18, 2025
Merged

Fix spawn order when group triggering tasks before the start cycle point#7101
oliver-sanders merged 6 commits intocylc:8.6.xfrom
MetRonnie:group-trigger-warm

Conversation

@MetRonnie
Copy link
Copy Markdown
Member

@MetRonnie MetRonnie commented Nov 26, 2025

Following a warm start, group triggering tasks that exist in the initial cycle point only, such as so-called "install cold" tasks, has a bug where the prerequisites within the group are not obeyed - they end up force-satisfied and the whole group submits at the same time.

Repro

[scheduler]
    allow implicit tasks = True

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    runahead limit = P2
    [[graph]]
        R1 = herring => cold1 => cold2 => foo
        P1 = foo[-P1] => foo

[runtime]
    [[COLD]]
    [[cold1, cold2]]
        inherit = COLD
$ cylc play wflow --startcp 5
$ cylc trigger wflow//^/COLD

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • No dependency changes
  • Tests are included
  • Changelog entry included if this is a change that can affect users
  • No docs needed
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@MetRonnie MetRonnie added this to the 8.6.2 milestone Nov 26, 2025
@MetRonnie MetRonnie self-assigned this Nov 26, 2025
@MetRonnie MetRonnie added the bug Something is wrong :( label Nov 26, 2025
@MetRonnie MetRonnie requested a review from wxtim November 28, 2025 16:23
@MetRonnie

This comment was marked as resolved.

@MetRonnie MetRonnie changed the title Fix group triggering of tasks before the start cycle point Fix spawn order when group triggering of tasks before the start cycle point Dec 1, 2025
@MetRonnie MetRonnie changed the title Fix spawn order when group triggering of tasks before the start cycle point Fix spawn order when group triggering tasks before the start cycle point Dec 1, 2025
@MetRonnie
Copy link
Copy Markdown
Member Author

Kicking tests

@MetRonnie MetRonnie closed this Dec 1, 2025
@MetRonnie MetRonnie reopened this Dec 1, 2025
@oliver-sanders
Copy link
Copy Markdown
Member

(coverage failing on uncovered repr methods)

@MetRonnie
Copy link
Copy Markdown
Member Author

(coverage failing on uncovered repr methods)

Added doctests

Copy link
Copy Markdown
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the issue where a group of tasks triggered before the start cycle point does not run in order.

I have fond that the workflow will "flow on" from the triggered tasks in a strange way.

E.G, take this workflow:

[scheduler]
    allow implicit tasks = True
[scheduling]
    initial cycle point = 1
    cycling mode = integer
    [[graph]]
        R1 = cold
        P1 = """
            cold[^] => start
            start => run => end
            run[-P1] => run
        """
$ cylc play --start-cycle-point=10  # warm start the workflow from cycle 10
$ cylc hold '*'  # hold all active cycles (10+) to reduce noise
$ cylc trigger '^/cold'  # trigger a R1 task

What happens next:

  • 1/cold runs (expected).
  • 1/start, 1/run, 1/end run (not expected).
  • 2/run spawns (not expected).
  • However, 2/run does not run as it is dependent on 2/start which has not spawned (very strange)
  • Workflow will go on to enter a runahead stall (not expected).

I think we need to work out how to handle the downstream impacts as part of this work in order to satisfy the use case.

@MetRonnie
Copy link
Copy Markdown
Member Author

MetRonnie commented Dec 5, 2025

The workflow flowing on from the triggered tasks is expected (but problematic) at the moment; I have not found a fix for that yet. However the runahead stall in your example is probably preferential to it not stalling, as it gives a chance to remove the unwanted tasks!

@oliver-sanders
Copy link
Copy Markdown
Member

oliver-sanders commented Dec 5, 2025

Understood, but unfortunately, I think that we need to come up with a solution to the flow-on problem before the intervention for this use case can be used in anger as it's too caveat-prone without this.

No idea what that solution would be however!

@wxtim
Copy link
Copy Markdown
Member

wxtim commented Dec 8, 2025

Does this not create the behaviour we require?

$ cylc trigger '^/cold'  --flow=none  # trigger a R1 task

(however we lose the group trigger interdependence if we trigger a set of tasks like this)

@MetRonnie
Copy link
Copy Markdown
Member Author

(however we lose the group trigger interdependence if we trigger a set of tasks like this)

That is the problem. Fortunately only a handful of operational workflows have cold start tasks with prereqs between them at the Met Office

@hjoliver
Copy link
Copy Markdown
Member

hjoliver commented Dec 9, 2025

(however we lose the group trigger interdependence if we trigger a set of tasks like this)

That is the problem. Fortunately only a handful of operational workflows have cold start tasks with prereqs between them at the Met Office

Unfortunately these are near-ubiquitous here at ESNZ, right @dwsutherland ? (Probably because for years now we've had tasks that deploy code into the run-dir from git repos).
Which is one of the main reasons we've wanted this forever:

  • Isolated graphs (startup, shutdown, ...) #7020
  • (in fact we commented on this flow-on problem there: "it makes retriggering the startup graph difficult or confusing (will it "flow on" again?)" - the comment probably dates back to when we had to use a new flow for retriggering, but moving the workflow start point forward brings the problem back again)

Otherwise, we've had several discussions in the past about how to stop flows from flowing on:

Possibilities include:

  • starting a flow with a defined end cycle-point
  • a second command to tell an existing flow when or where to stop

(These require more understanding of flows from the user).

Maybe a short-cut variation on the flow-end-point idea?

  • cylc trigger --single-cycle-point=^

@oliver-sanders
Copy link
Copy Markdown
Member

oliver-sanders commented Dec 9, 2025

@hjoliver, the issue we're discussing here is specific to warm starts only, but isn't strictly specific to R1 tasks (though the use case we're focusing on is).

Generalisation of the problem: The triggering of tasks before the start cycle point in a warm started workflow [1].

Problems with this ATM:

  1. In-group dependencies are ignored (fixed by this PR).
  2. Cylc flows on from the triggered tasks (the remainder of this discussion).

Re-triggering R1 tasks is normally no issue, Cylc does not flow-on because it looks in the DB and discovers that the downstream tasks have already run. This mechanism works just fine for cold starts...

However, with warm starts [1], we delete the workflow database and restart from a specified "start cycle point". Cylc assumes that everything before the start cycle point has succeeded [2] as part of the workflow startup logic, however, this assumption is constrained to the startup logic. When those R1 tasks run, their outputs cause downstreams to run because they do not exist in the database (because we deleted the database, this is a warm start!). Note that with a warm start, there is only one flow (also also that we do not use new flows at the MO), it's not the starting and stopping of flows that's an issue, it's the lack of workflow history (because it's a warm start!).

This is really an internal consistency issue, one part of Cylc (startup logic) is saying that everything before the start cycle point has succeeded, whereas another part of Cylc (pre-spawn check) is saying that they didn't. As a result, the behaviour is unhelpful and defying user expectations.

However, I think we can resolve this issue by patching the pre-spawn check to match/reflect the warm start logic. If a task is before the start cycle point, we would simply assume it to have succeeded in the absence of a DB entry to the contrary. This would make the startup and task pool logic consistent, the resulting behaviours would match the cold-start scenario:

Under this approach:

  • If you trigger a task before the start cycle point, it will run, but not spawn downstreams.
  • If you trigger a group of tasks before the start cycle point, they will run in order (with this PR), but not spawn downstreams.
  • If you remove, set or trigger a task, then an entry would be created, overwriting the defaut "assume this task succeeded" logic.
  • No new commands, options or semantics required, warm-start behaviours match cold-start ones for default options.

WDYT

Notes:
[1] Warm start meaning, shut down the workflow, delete the DB, start the workflow from a specified start cycle point. This is not an everyday intervention. Warm starts are a mostly just a useful backstop for emergency situations.
[2] By "succeeded" I mean "final completed".

@hjoliver
Copy link
Copy Markdown
Member

hjoliver commented Dec 9, 2025

[UPDATE: read my follow-up before responding to these comments - they're not wrong, but I like the proposed alternative!]

@oliver-sanders - I understand what a warm-start is - I think we just have a different take on the proper generalization.

Note that with a warm start, there is only one flow ..., it's not the starting and stopping of flows that's an issue, it's the lack of workflow history (because it's a warm start!).

Well, given the lack of history to stop the flow (due to deliberate deletion of it!) stopping the flow is the problem! So, this is (or at least can be viewed as) a particular example of the more general need for control over flow termination. (Note I mean "the flow" in a generic sense - it doesn't have to be a new flow).

I think you're really saying that users expect there to be history to stop flow 1 from continuing, even though they deliberately deleted the history! (Or that they shouldn't even have to understand that deleting the DB does that - meh, that's a pretty violent action, I don't think it's unreasonable to have to think about the consequences!).

Also, note that users would not necessarily have to use the low level flow control capability directly (c.f. group trigger vs lower level manual remove, set, and trigger). E.g. for this sort of use case we could provide something like this:

 cylc trigger --no-flow-on <task-ids>

to mean run <task-ids> with a new flow configured to stop after that group.


This is really an internal consistency issue, one part of Cylc (startup logic) is saying that everything before the start cycle point has succeeded, whereas another part of Cylc (pre-spawn check) is saying that they didn't.

I do see this perspective as well, although arguably the real problem is that the startup logic is an imperfect bodge that affects more graph than it should do - it was a convenience, to bootstrap initial inter-cycle triggers so that users don't have to define the real start-up dependencies explicitly, but in addition it wipes out all dependencies back to the start of the graph

@hjoliver
Copy link
Copy Markdown
Member

hjoliver commented Dec 9, 2025

However, I think we can resolve this issue by patching the pre-spawn check to match/reflect the warm start logic. If a task is before the start cycle point, we would simply assume it to have succeeded in the absence of a DB entry to the contrary. This would make the startup and task pool logic consistent, the resulting behaviours would match the cold-start scenario:
WDYT

Nice.

I stand by my comments above - if we had general flow termination capability we could use that to solve this specific problem.

However, I like your suggestion! It makes sense, it's easy to implement, and it solves this specific problem quickly. I guess sometimes a less general solution wins out...

(BTW this is also a means of flow termination, just not a general one in that it is specific to pre-start points: we will assume flow history prior to the start point even if it does not exist in the DB, which will terminate the flow downstream of triggered tasks.)


So, should @MetRonnie do that on this PR, or shall we have a follow-up PR (which had better be released at the same time)?

@wxtim
Copy link
Copy Markdown
Member

wxtim commented Dec 10, 2025

I think I've implemented @oliver-sanders' suggestion in MetRonnie/cylc-flow@group-trigger-warm...wxtim:cylc:group-trigger-warm, (not suitable for a PR, contains manual test), but the test case workflow only runs cold1. Can I invite @oliver-sanders to check my change and @MetRonnie to check the combination?

@oliver-sanders
Copy link
Copy Markdown
Member

@wxtim, yep, that's the right start and it works for triggering a single task, correctly suppressing flow-on.

However, it doesn't work for triggering a group of tasks as all downstreams are considered complete. This is where the flow=None bit comes in. To additionally need to make it look like these tasks exist to cylc remove so that the in-group tasks get removed resulting in flow=None entries in the DB.

@wxtim
Copy link
Copy Markdown
Member

wxtim commented Dec 11, 2025

Leaving this with @MetRonnie

@MetRonnie
Copy link
Copy Markdown
Member Author

(I have finished working on this PR, it's ready for review. The bug fix for the flow-on problem is on a separate branch)

@oliver-sanders
Copy link
Copy Markdown
Member

Reviewing this in combination with the flow-on suppression in #7148

@oliver-sanders oliver-sanders merged commit d4c8263 into cylc:8.6.x Dec 18, 2025
22 of 23 checks passed
@MetRonnie MetRonnie deleted the group-trigger-warm branch December 18, 2025 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is wrong :(

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants