Fix spawn order when group triggering tasks before the start cycle point#7101
Fix spawn order when group triggering tasks before the start cycle point#7101oliver-sanders merged 6 commits intocylc:8.6.xfrom
Conversation
26868fa to
61adbce
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
Kicking tests |
|
(coverage failing on uncovered repr methods) |
They were not obeying prerequisites within the group
Partially reverts e6c4adf
74fa554 to
d4c8263
Compare
Added doctests |
There was a problem hiding this comment.
This fixes the issue where a group of tasks triggered before the start cycle point does not run in order.
I have fond that the workflow will "flow on" from the triggered tasks in a strange way.
E.G, take this workflow:
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = 1
cycling mode = integer
[[graph]]
R1 = cold
P1 = """
cold[^] => start
start => run => end
run[-P1] => run
"""$ cylc play --start-cycle-point=10 # warm start the workflow from cycle 10
$ cylc hold '*' # hold all active cycles (10+) to reduce noise
$ cylc trigger '^/cold' # trigger a R1 taskWhat happens next:
1/coldruns (expected).1/start,1/run,1/endrun (not expected).2/runspawns (not expected).- However,
2/rundoes not run as it is dependent on2/startwhich has not spawned (very strange) - Workflow will go on to enter a runahead stall (not expected).
I think we need to work out how to handle the downstream impacts as part of this work in order to satisfy the use case.
|
The workflow flowing on from the triggered tasks is expected (but problematic) at the moment; I have not found a fix for that yet. However the runahead stall in your example is probably preferential to it not stalling, as it gives a chance to remove the unwanted tasks! |
|
Understood, but unfortunately, I think that we need to come up with a solution to the flow-on problem before the intervention for this use case can be used in anger as it's too caveat-prone without this. No idea what that solution would be however! |
|
Does this not create the behaviour we require? $ cylc trigger '^/cold' --flow=none # trigger a R1 task(however we lose the group trigger interdependence if we trigger a set of tasks like this) |
That is the problem. Fortunately only a handful of operational workflows have cold start tasks with prereqs between them at the Met Office |
Unfortunately these are near-ubiquitous here at ESNZ, right @dwsutherland ? (Probably because for years now we've had tasks that deploy code into the run-dir from git repos).
Otherwise, we've had several discussions in the past about how to stop flows from flowing on: Possibilities include:
(These require more understanding of flows from the user). Maybe a short-cut variation on the flow-end-point idea?
|
|
@hjoliver, the issue we're discussing here is specific to warm starts only, but isn't strictly specific to R1 tasks (though the use case we're focusing on is). Generalisation of the problem: The triggering of tasks before the start cycle point in a warm started workflow [1]. Problems with this ATM:
Re-triggering R1 tasks is normally no issue, Cylc does not flow-on because it looks in the DB and discovers that the downstream tasks have already run. This mechanism works just fine for cold starts... However, with warm starts [1], we delete the workflow database and restart from a specified "start cycle point". Cylc assumes that everything before the start cycle point has succeeded [2] as part of the workflow startup logic, however, this assumption is constrained to the startup logic. When those R1 tasks run, their outputs cause downstreams to run because they do not exist in the database (because we deleted the database, this is a warm start!). Note that with a warm start, there is only one flow (also also that we do not use new flows at the MO), it's not the starting and stopping of flows that's an issue, it's the lack of workflow history (because it's a warm start!). This is really an internal consistency issue, one part of Cylc (startup logic) is saying that everything before the start cycle point has succeeded, whereas another part of Cylc (pre-spawn check) is saying that they didn't. As a result, the behaviour is unhelpful and defying user expectations. However, I think we can resolve this issue by patching the pre-spawn check to match/reflect the warm start logic. If a task is before the start cycle point, we would simply assume it to have succeeded in the absence of a DB entry to the contrary. This would make the startup and task pool logic consistent, the resulting behaviours would match the cold-start scenario: Under this approach:
WDYT
|
|
[UPDATE: read my follow-up before responding to these comments - they're not wrong, but I like the proposed alternative!] @oliver-sanders - I understand what a warm-start is - I think we just have a different take on the proper generalization.
Well, given the lack of history to stop the flow (due to deliberate deletion of it!) stopping the flow is the problem! So, this is (or at least can be viewed as) a particular example of the more general need for control over flow termination. (Note I mean "the flow" in a generic sense - it doesn't have to be a new flow). I think you're really saying that users expect there to be history to stop flow 1 from continuing, even though they deliberately deleted the history! (Or that they shouldn't even have to understand that deleting the DB does that - meh, that's a pretty violent action, I don't think it's unreasonable to have to think about the consequences!). Also, note that users would not necessarily have to use the low level flow control capability directly (c.f. group trigger vs lower level manual remove, set, and trigger). E.g. for this sort of use case we could provide something like this: to mean run
I do see this perspective as well, although arguably the real problem is that the startup logic is an imperfect bodge that affects more graph than it should do - it was a convenience, to bootstrap initial inter-cycle triggers so that users don't have to define the real start-up dependencies explicitly, but in addition it wipes out all dependencies back to the start of the graph |
Nice. I stand by my comments above - if we had general flow termination capability we could use that to solve this specific problem. However, I like your suggestion! It makes sense, it's easy to implement, and it solves this specific problem quickly. I guess sometimes a less general solution wins out... (BTW this is also a means of flow termination, just not a general one in that it is specific to pre-start points: we will assume flow history prior to the start point even if it does not exist in the DB, which will terminate the flow downstream of triggered tasks.) So, should @MetRonnie do that on this PR, or shall we have a follow-up PR (which had better be released at the same time)? |
|
I think I've implemented @oliver-sanders' suggestion in MetRonnie/cylc-flow@group-trigger-warm...wxtim:cylc:group-trigger-warm, (not suitable for a PR, contains manual test), but the test case workflow only runs |
|
@wxtim, yep, that's the right start and it works for triggering a single task, correctly suppressing flow-on. However, it doesn't work for triggering a group of tasks as all downstreams are considered complete. This is where the |
|
Leaving this with @MetRonnie |
|
(I have finished working on this PR, it's ready for review. The bug fix for the flow-on problem is on a separate branch) |
|
Reviewing this in combination with the flow-on suppression in #7148 |
Following a warm start, group triggering tasks that exist in the initial cycle point only, such as so-called "install cold" tasks, has a bug where the prerequisites within the group are not obeyed - they end up force-satisfied and the whole group submits at the same time.
Repro
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.?.?.xbranch.