fix: auto promotion blocked using requiredSoakTime#6359
Open
justin0u0 wants to merge 2 commits into
Open
Conversation
✅ Deploy Preview for docs-kargo-io ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
requiredSoakTime
Author
|
Hi @krancour, @EronWright, could someone please take a look at this pull request? Thank you. |
…tages Auto-promotion could either incorrectly pass or permanently stall the soak time gate (Stage.spec.requestedFreight[].sources.requiredSoakTime), depending on the upstream Stage topology. Two distinct problems are addressed: 1. Soak accrued in unrelated Stages satisfied the gate. ListFreightFromWarehouse iterated over every Stage in the Freight's status.verifiedIn map when checking whether the required soak time had elapsed. Soak time accumulated in Stages that were NOT named as upstream sources could therefore satisfy the requirement, letting Freight promote before it had actually soaked in the configured upstream Stage(s). Restrict the check to the Stages named in the source's `stages` list (opts.VerifiedIn). 2. A control-flow upstream Stage could never satisfy the gate. Control-flow Stages verify Freight without ever holding it: the Freight is never added to status.currentlyIn and no completed soak is ever recorded. GetLongestSoak only consulted those two fields, so it returned 0 for a control-flow Stage no matter how long ago the Freight was verified there. With problem 1 fixed, a downstream Stage whose upstream source is a control-flow Stage could then never pass its requiredSoakTime, and auto-promotion stalled indefinitely. When neither a current residency nor a completed soak is recorded for a Stage in which the Freight is verified -- which uniquely identifies a control-flow Stage -- measure the soak from the Freight's verifiedAt timestamp instead. Regular Stages always have one of the two records, so their behavior is unchanged. Refs: akuity#4586 Signed-off-by: Justin Chen <mail@justin0u0.com>
Auto-promotion silently stalled whenever a downstream Stage's source
specified requiredSoakTime:
- Regular Stages requeued on a fixed 5 minute cadence, so short soak
windows (or soak deadlines that did not land on a 5-minute boundary)
could be missed until a later tick happened to fall past the deadline.
- Control-flow Stages returned ctrl.Result{} with no RequeueAfter at
all, relying entirely on watch events. No watch fires when 'soak time
has elapsed', so a control-flow Stage with requiredSoakTime set never
woke up on its own to re-evaluate the soak and verify its Freight,
blocking every downstream Stage indefinitely.
Add a calculateNextSoakCheck helper that lists candidate Freight per
upstream source (with the soak filter disabled), computes the remaining
soak per upstream Stage according to the configured
AvailabilityStrategy, and returns the soonest deadline plus a 1 second
buffer.
Wire the helper into both reconcilers:
- The regular reconciler now requeues at min(5m, soakDeadline).
- The control-flow reconciler requeues at the soak deadline when one
exists, and otherwise keeps the previous watch-driven behavior.
Refs: akuity#4586
Signed-off-by: Justin Chen <mail@justin0u0.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All pull requests must reference an existing issue with no blocking labels.
PRs that do not meet this requirement will be automatically closed. See the
Contributor Guide for details.
Issue Reference
Closes #4586
Description
Fixes an issue where auto-promotion stalls (or fires too early) when a Stage source sets
requiredSoakTime, affectingAll,OneOf, and control-flow stages. Three root causes:Stages weren't requeued to re-check the soak deadline. A soak elapsing isn't a watch event, so an idle Stage never woke to promote once soak completed. Regular stages only requeued every 5m; control-flow stages never requeued at all, blocking downstream stages indefinitely. A new
calculateNextSoakCheckhelper now requeues both at the soonest pending deadline (min(5m, deadline)for regular stages).Soak accrued in unrelated stages satisfied the gate.
ListFreightFromWarehousechecked soak against every stage inverifiedIn, not just the source's upstreamstages. With a 1h soak on dev→staging and staging→prod, prod promoted immediately instead of waiting for its own soak. The check is now scoped to the configured upstream stages.Control-flow upstream stages could never satisfy soak. They verify Freight without holding it, so
currentlyIn/LongestCompletedSoakare never set andGetLongestSoakalways returned 0. Soak is now measured fromverifiedAtwhen neither record exists; regular stages are unaffected.Verification
I have run the manifests provided in #4586 and confirmed that stages using both "All" and "OneOf" strategies can be promoted correctly.
I have also tested control-flow stages to ensure that promotion progresses as expected (the soak time is set to 1 hour for this case):
Checklist
kind/proposal,needs discussion,needs research,maintainer only,area/security,size/large,size/x-large,size/xx-large).AI Use Disclosure
Select one:
Sign-Off
git commit -s) (required)git commit -S) (encouraged)