Skip to content

fix: auto promotion blocked using requiredSoakTime#6359

Open
justin0u0 wants to merge 2 commits into
akuity:mainfrom
justin0u0:main
Open

fix: auto promotion blocked using requiredSoakTime#6359
justin0u0 wants to merge 2 commits into
akuity:mainfrom
justin0u0:main

Conversation

@justin0u0

@justin0u0 justin0u0 commented May 28, 2026

Copy link
Copy Markdown

All pull requests must reference an existing issue with no blocking labels.
PRs that do not meet this requirement will be automatically closed. See the
Contributor Guide for details.

Issue Reference

Closes #4586

Description

Fixes an issue where auto-promotion stalls (or fires too early) when a Stage source sets requiredSoakTime, affecting All, OneOf, and control-flow stages. Three root causes:

  1. Stages weren't requeued to re-check the soak deadline. A soak elapsing isn't a watch event, so an idle Stage never woke to promote once soak completed. Regular stages only requeued every 5m; control-flow stages never requeued at all, blocking downstream stages indefinitely. A new calculateNextSoakCheck helper now requeues both at the soonest pending deadline (min(5m, deadline) for regular stages).

  2. Soak accrued in unrelated stages satisfied the gate. ListFreightFromWarehouse checked soak against every stage in verifiedIn, not just the source's upstream stages. With a 1h soak on dev→staging and staging→prod, prod promoted immediately instead of waiting for its own soak. The check is now scoped to the configured upstream stages.

  3. Control-flow upstream stages could never satisfy soak. They verify Freight without holding it, so currentlyIn/LongestCompletedSoak are never set and GetLongestSoak always returned 0. Soak is now measured from verifiedAt when neither record exists; regular stages are unaffected.

Verification

I have run the manifests provided in #4586 and confirmed that stages using both "All" and "OneOf" strategies can be promoted correctly.

8ogQ4TWA9po

I have also tested control-flow stages to ensure that promotion progresses as expected (the soak time is set to 1 hour for this case):

8ogQcuR4EaW

Checklist

  • The PR is linked to an existing issue.
  • The linked issue has no blocking labels (kind/proposal,
    needs discussion, needs research, maintainer only, area/security,
    size/large, size/x-large, size/xx-large).
  • I have added or updated tests as appropriate.
  • I have added or updated documentation as appropriate.

AI Use Disclosure

Select one:

  • This PR was written by a human without AI assistance.
  • This PR was written by a human with AI assistance. A human has reviewed every line prior to opening the PR.
  • This PR was written by an AI with human supervision. A human has reviewed every line prior to opening the PR.
  • This PR was written entirely by AI. No human has reviewed this prior to opening the PR.

Sign-Off

  • All commits are signed off (git commit -s) (required)
  • All commits are cryptographically signed (git commit -S) (encouraged)

@justin0u0 justin0u0 requested a review from a team as a code owner May 28, 2026 07:46
@netlify

netlify Bot commented May 28, 2026

Copy link
Copy Markdown

Deploy Preview for docs-kargo-io ready!

Name Link
🔨 Latest commit 3a28ceb
🔍 Latest deploy log https://app.netlify.com/projects/docs-kargo-io/deploys/6a25437804582d00082fbca7
😎 Deploy Preview https://deploy-preview-6359.docs.kargo.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@kargo-governance-bot kargo-governance-bot Bot added kind/bug Something isn't working as intended; If unsure that something IS a bug, start a discussion instead needs/area Issue or PR needs to be labeled to indicate what parts of the code base are affected needs/priority Priority has not yet been determined; a good signal that maintainers aren't fully committed labels May 28, 2026
@justin0u0 justin0u0 changed the title Fixes Auto Promotion Blocked Using requiredSoakTime fix: auto promotion blocked using requiredSoakTime May 28, 2026
@justin0u0

Copy link
Copy Markdown
Author

Hi @krancour, @EronWright,

could someone please take a look at this pull request? Thank you.

justin0u0 added 2 commits June 7, 2026 17:57
…tages

Auto-promotion could either incorrectly pass or permanently stall the
soak time gate (Stage.spec.requestedFreight[].sources.requiredSoakTime),
depending on the upstream Stage topology. Two distinct problems are
addressed:

1. Soak accrued in unrelated Stages satisfied the gate.

   ListFreightFromWarehouse iterated over every Stage in the Freight's
   status.verifiedIn map when checking whether the required soak time
   had elapsed. Soak time accumulated in Stages that were NOT named as
   upstream sources could therefore satisfy the requirement, letting
   Freight promote before it had actually soaked in the configured
   upstream Stage(s). Restrict the check to the Stages named in the
   source's `stages` list (opts.VerifiedIn).

2. A control-flow upstream Stage could never satisfy the gate.

   Control-flow Stages verify Freight without ever holding it: the
   Freight is never added to status.currentlyIn and no completed soak is
   ever recorded. GetLongestSoak only consulted those two fields, so it
   returned 0 for a control-flow Stage no matter how long ago the Freight
   was verified there. With problem 1 fixed, a downstream Stage whose
   upstream source is a control-flow Stage could then never pass its
   requiredSoakTime, and auto-promotion stalled indefinitely.

   When neither a current residency nor a completed soak is recorded for
   a Stage in which the Freight is verified -- which uniquely identifies
   a control-flow Stage -- measure the soak from the Freight's verifiedAt
   timestamp instead. Regular Stages always have one of the two records,
   so their behavior is unchanged.

Refs: akuity#4586
Signed-off-by: Justin Chen <mail@justin0u0.com>
Auto-promotion silently stalled whenever a downstream Stage's source
specified requiredSoakTime:

- Regular Stages requeued on a fixed 5 minute cadence, so short soak
  windows (or soak deadlines that did not land on a 5-minute boundary)
  could be missed until a later tick happened to fall past the deadline.
- Control-flow Stages returned ctrl.Result{} with no RequeueAfter at
  all, relying entirely on watch events. No watch fires when 'soak time
  has elapsed', so a control-flow Stage with requiredSoakTime set never
  woke up on its own to re-evaluate the soak and verify its Freight,
  blocking every downstream Stage indefinitely.

Add a calculateNextSoakCheck helper that lists candidate Freight per
upstream source (with the soak filter disabled), computes the remaining
soak per upstream Stage according to the configured
AvailabilityStrategy, and returns the soonest deadline plus a 1 second
buffer.

Wire the helper into both reconcilers:

- The regular reconciler now requeues at min(5m, soakDeadline).
- The control-flow reconciler requeues at the soak deadline when one
  exists, and otherwise keeps the previous watch-driven behavior.

Refs: akuity#4586
Signed-off-by: Justin Chen <mail@justin0u0.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working as intended; If unsure that something IS a bug, start a discussion instead needs/area Issue or PR needs to be labeled to indicate what parts of the code base are affected needs/priority Priority has not yet been determined; a good signal that maintainers aren't fully committed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

requiredSoakTime Breaks Auto Promotion In Different Ways (All and OneOf)

1 participant