Skip to content

warm start: handle interfaces which request pre-initial flow history #7153

@oliver-sanders

Description

@oliver-sanders

So called "warm starts" are an emergency fallback which allow us to restart a workflow from any point in the graph.

To achieve this:

  1. Delete (or archive) the workflow DB.
  2. Restart the workflow using the cylc play --start-cycle-point flag to specify the point in the graph that the workflow should continue from.
  3. Cylc will assume that everything before this cycle has completed and continue running the workflow as normal.

This worked fine under Cylc 7, where all graph state was preserved in memory in the task pool. However, with Cylc 8, the state resides (partially) in the database. As a result, the pre-initial logic may say that a task has completed, whereas a database interface may say that it has not been spawned.

This inconsistency may result in undefined behaviours. So far we have encountered one example, where tasks triggered before the start cycle point will "flow-on" against user expectations (the user has told Cylc to assume that these tasks have completed, so would expect only the triggered tasks to run, not their downstreams).

Cylc should behave much the same in the warm start case as would in the cold start case (where the prior cycles had all completed).

As usage of the database to inform the task pool data window increases (#6143), there is potential for further conflicts.

As a temporary workaround for the issue of triggering pre-initial tasks, task spawning was restricted to cover only manually triggered tasks (#7148). However, this is an imperfect solution as, targetting only trigger, it breaks other expected behaviours.


Brute-force solution

On startup, fill in DB entries for all tasks between the initial and start cycle points (as if they had been completed in skip mode?).

Perfectly possible, perhaps the simplest and most reliable solution.

But also a tad ugly.


Wrap DB interfacing logic

E.g:

  • If we try to spawn a task before the start cycle point AND there is no record of it in the database, then assume it has completed (pre-initial logic).
  • Allow cylc remove to operate on no-record entries, demoting them to flow=None.
  • Use the presence of a flow=None entry in the DB to indicate that a task should not be considered to have completed (allowing cylc remove and group-trigger to do their thing).
  • Disallow cylc trigger --flow=none for pre-initial tasks (not needed since flow-on will be suppressed by default, avoids consistency issues).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug?Not sure if this is a bug or not

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions