From cfed6f3fd7f0fa7394b5f3be79af8a8138a54650 Mon Sep 17 00:00:00 2001 From: Patryk Bundyra Date: Wed, 26 Feb 2025 18:00:31 +0100 Subject: [PATCH] Update KEP with new PodsReady reasons (#4419) --- keps/349-all-or-nothing/README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/keps/349-all-or-nothing/README.md b/keps/349-all-or-nothing/README.md index 05a4cd7b33..e9cf04d23a 100644 --- a/keps/349-all-or-nothing/README.md +++ b/keps/349-all-or-nothing/README.md @@ -392,9 +392,13 @@ flowchart TD; id4 --"timeout exceeded"--> id5 ``` -We introduce new `WorkloadWaitForPodsStart` and `WorkloadWaitForPodsRecovery` reasons to distinguish the reasons of setting the `PodsReady=false` condition. -`WorkloadWaitForPodsStart` will be set before the job started and is replacement for the old `PodsReady` reason. -`WorkloadWaitForPodsRecovery` will be set after the job started. +We introduce new `WorkloadWaitForStart` and `WorkloadWaitForRecovery` reasons to distinguish the reasons of setting the `PodsReady=false` condition. +`WorkloadWaitForĊ„Start` will be set before the job started and is replacement for the old `PodsReady` reason. +`WorkloadWaitForRecovery` will be set after the job started. + +Additionally we introduce new `WorkloadStarted` and `WorkloadRecovered` reasons to distinguish the reasons of setting the `PodsReady=true` condition. +`WorkloadStarted` will be set when all Pods reach readiness after Kueue admitted the job +`WorkloadRecovered` will be set after the job has recovered after a Pod's failure. When any of the timeouts is exceeded, the Kueue's Job Controller suspends the Job corresponding to the workload and puts into the