late tasks management on server recovery #18213
-
hello I'm doing a validation to use "prefect version 3" for our use cases. the desired architecture is a "prefect" server with one worker per "workpool". I would like to know what is the best approach to handle a server outage. I especially don't want tasks in "late" status to be replayed by the workers when the server restarts, this could create unnecessary processes. I tried to manage it with "automations" but that doesn't completely solve the problem. thanking you in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
hi @larrieu-olivier - thanks for the discussion and good question! your proposed setup makes sense to me.
my initial reaction was to use an automation to delete
my first (heavy-handed) instinct would be to have a setup script that runs before DELETE FROM flow_runs
WHERE state_type = 'SCHEDULED'
AND state_name = 'Late'
AND next_scheduled_start_time < NOW() - INTERVAL '3 hours'; but perhaps we ought to allow some hook at server startup time for cleanup like this in a more first-class way |
Beta Was this translation helpful? Give feedback.
hi @larrieu-olivier - thanks for the discussion and good question!
your proposed setup makes sense to me.
my initial reaction was to use an automation to delete
Late
flow runs after some time interval but I realize you might undesirably replayLate
runs at startup if the automation doesn't fire before your workers pick up some runs, which is maybe what you were referring to here:my first (heavy-handed) instinct would be to have a setup script that runs before