expire: fix interaction with automatic retries#7056
Conversation
|
(rebased) |
a43265f to
fcf54ea
Compare
|
OK, on master, the On manually setting an output, we call |
|
Taking a closer look at the bug, the loop in #6717 is caused by this:
In principle this bug affects other outputs too (e.g. The (+) line above explains why I think this fix ( HOWEVER there is at least one minor undesirable side effect though: the DB And there could be others too - we'd need to check the consequences everywhere the handling of natural vs forced messages is different, and it makes me a bit nervous because the expiry is actually natural not forced. |
There was a problem hiding this comment.
Given my analysis of the problem above, I think ideally forced=True should be reserved for manual output completion (i.e. cylc set) and a better fix for the bug is to tweak task_message_check() to stop it ignoring natural expired messages for tasks waiting on retries.
I've checked this diff on master works:
diff --git a/cylc/flow/task_events_mgr.py b/cylc/flow/task_events_mgr.py
index 91d0ec0c6..44e82d8b4 100644
--- a/cylc/flow/task_events_mgr.py
+++ b/cylc/flow/task_events_mgr.py
@@ -932,6 +932,7 @@ class TaskEventsManager():
if (
itask.state(TASK_STATUS_WAITING)
+ and message != TASK_OUTPUT_EXPIRED
# Polling in live mode only:
and itask.run_mode == RunMode.LIVE
and (The comment in this code block is:
# Ignore messages if task has a retry lined up
# (caused by polling overlapping with task failure)
I guess that means e.g. the task polls as running, then fails and goes to waiting on retry before the poll result comes in, in which case the poll result should be ignored.
That scenario doesn't apply to automatic expiry, which is not like a delayed poll result.
|
Ok, thanks for the analysis, will take a look... |
hjoliver
left a comment
There was a problem hiding this comment.
Maybe just expand the comment in the tweaked code block for future reference, as this stuff isn't particularly easy to follow. Something like this:
# Ignore non-expire messages if task is waiting with a retry lined up.
# Waiting tasks normally advance to a new state due to any message,
# but the retry implies late arrival after task failure (e.g. a delayed poll
# result). Task expire messages are internal, so excluded from this.|
(added comment) |
Co-authored-by: Tim Pillinger <26465611+wxtim@users.noreply.github.com>
|
(All tests passed; just merged the in-comment typo fix with skip-ci). |
(Note, this issue needs to be resolved before expire triggers can be used in anger)
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.