-
Notifications
You must be signed in to change notification settings - Fork 95
warm start: handle interfaces which request pre-initial flow history #7153
Description
So called "warm starts" are an emergency fallback which allow us to restart a workflow from any point in the graph.
To achieve this:
- Delete (or archive) the workflow DB.
- Restart the workflow using the
cylc play --start-cycle-pointflag to specify the point in the graph that the workflow should continue from. - Cylc will assume that everything before this cycle has completed and continue running the workflow as normal.
This worked fine under Cylc 7, where all graph state was preserved in memory in the task pool. However, with Cylc 8, the state resides (partially) in the database. As a result, the pre-initial logic may say that a task has completed, whereas a database interface may say that it has not been spawned.
This inconsistency may result in undefined behaviours. So far we have encountered one example, where tasks triggered before the start cycle point will "flow-on" against user expectations (the user has told Cylc to assume that these tasks have completed, so would expect only the triggered tasks to run, not their downstreams).
Cylc should behave much the same in the warm start case as would in the cold start case (where the prior cycles had all completed).
As usage of the database to inform the task pool data window increases (#6143), there is potential for further conflicts.
As a temporary workaround for the issue of triggering pre-initial tasks, task spawning was restricted to cover only manually triggered tasks (#7148). However, this is an imperfect solution as, targetting only trigger, it breaks other expected behaviours.
Brute-force solution
On startup, fill in DB entries for all tasks between the initial and start cycle points (as if they had been completed in skip mode?).
Perfectly possible, perhaps the simplest and most reliable solution.
But also a tad ugly.
Wrap DB interfacing logic
E.g:
- If we try to spawn a task before the start cycle point AND there is no record of it in the database, then assume it has completed (pre-initial logic).
- Allow
cylc removeto operate on no-record entries, demoting them toflow=None. - Use the presence of a
flow=Noneentry in the DB to indicate that a task should not be considered to have completed (allowingcylc removeand group-trigger to do their thing). - Disallow
cylc trigger --flow=nonefor pre-initial tasks (not needed since flow-on will be suppressed by default, avoids consistency issues).