Skip to content

fix: schedule lifecycle hooks for non-leaf DAG tasks on failure. Fixes #16057#16058

Open
fcolombo7 wants to merge 1 commit intoargoproj:mainfrom
fcolombo7:fix/dag-hook-non-leaf-omit-cascade
Open

fix: schedule lifecycle hooks for non-leaf DAG tasks on failure. Fixes #16057#16058
fcolombo7 wants to merge 1 commit intoargoproj:mainfrom
fcolombo7:fix/dag-hook-non-leaf-omit-cascade

Conversation

@fcolombo7
Copy link
Copy Markdown

@fcolombo7 fcolombo7 commented Apr 30, 2026

Fixes #16057

Motivation

When a non-leaf DAG task (one whose template is itself a DAG or steps) fails, its lifecycle hooks are never scheduled if the failure causes downstream tasks to be marked Omitted.

The root cause: hook evaluation in executeDAGTask runs at the top of the function while the node is still Running. For non-leaf tasks, the phase transition Running → Failed happens later inside executeTemplate. By then hook evaluation has already passed and no further sweep is triggered for that task, so the hook is permanently lost.
Leaf tasks (bare container pods) are unaffected because their phase is updated by podReconciliation before executeDAGTask runs.

Modifications

Merged the post-executeTemplate hook re-evaluation into the existing if node.Completed() block at the bottom of executeDAGTask's inner loop. Hooks are evaluated first; if not yet complete the function returns early, blocking the Omit cascade via the existing CheckAllHooksFullfilled gate in evaluateDependsLogic.
Once fulfilled, runOnExitNode proceeds as before.

Verification

Added two regression tests to dag_test.go using nested inner DAG templates (required to reproduce; bare container tasks are reconciled before executeDAGTask runs):

  • TestDAGHookNonLeafSerialHeadFails: non-leaf task fails with one downstream dependent
  • TestDAGHookNonLeafFanInBothFail: two parallel non-leaf tasks both fail with a shared downstream

go test ./workflow/controller/ -run TestDAGHookNonLeaf passes. Full go test ./workflow/controller/ passes with no regressions.

Documentation

No documentation change needed.

AI

Claude (Anthropic) was used to navigate and understand the codebase, identify the root cause, and assist in writing the fix and regression tests. All code was reviewed and validated by the author per the Argo project Generative AI policy.

…sks. Fixes argoproj#16057

Signed-off-by: fcolombo7 <colombofilippo.fc@gmail.com>
@fcolombo7 fcolombo7 changed the title fix: re-evaluate DAG task hooks after executeTemplate for non-leaf ta… fix: schedule lifecycle hooks for non-leaf DAG tasks on failure. Fixes #16057 Apr 30, 2026
@fcolombo7 fcolombo7 marked this pull request as ready for review April 30, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DAG task lifecycle hooks silently skipped when task failure triggers Omit cascade on downstream tasks

1 participant