feat(cji): add expected-image-tag annotation to wait for correct app image#1779
Open
rodrigonull wants to merge 1 commit into
Open
feat(cji): add expected-image-tag annotation to wait for correct app image#1779rodrigonull wants to merge 1 commit into
rodrigonull wants to merge 1 commit into
Conversation
…image When a ClowdJobInvocation carries the annotation clowder.redhat.com/expected-image-tag, the controller verifies that the ClowdApp's job images contain the expected tag before proceeding. If the images don't match, the CJI is requeued until the ClowdApp is updated by the deployment pipeline. This solves a race condition where the CJI is applied before the ClowdApp has been updated with the new image, causing the job to run with an outdated container image.
a11f9f3 to
62f52d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for a
clowder.redhat.com/expected-image-tagannotation onClowdJobInvocationresources. When present, the CJI controller verifies that theClowdApp's job images contain the expected tag before creating the Job. If the images don't match, the CJI is requeued until the ClowdApp is updated.Problem
When deploying via App-Interface, the
ClowdJobInvocationandClowdAppare applied as separateresourceTemplates. There is no guaranteed ordering between them, which causes a race condition: the CJI can be reconciled before the ClowdApp has been updated with the new image, resulting in the migration job running with an outdated container image.Solution
The deployment pipeline (e.g., App-Interface saas-deploy) can now annotate the CJI with the expected image tag:
The controller will:
:<expected-tag>Warningevent (ImageTagMismatch), setReconciliationFailedcondition, and requeueThe CJI will requeue indefinitely until the image matches — this is intentional, as running with the wrong image is worse than not running at all.
Changes
clowdjobinvocation_controller.go: Added annotation constant, image tag check logic inReconcile(), and helper functionappJobImagesContainTag()clowdjobinvocation_controller_test.go: Added unit tests covering mismatch (requeue), match (proceed), and no-annotation (skip) scenariosUsage
In the saas-deploy file, add the annotation to the CJI resource template:
Test Plan
Notes
This is a third approach to solving the race condition where the DB migration CJI runs with an outdated image. Previous attempts:
Generation check (PR fix: prevent CJI with runOnNotReady from using stale job image #1742): Added
metadata.generation/status.generationcomparison to detect stale ClowdApp state. Did not solve the issue.APIReader bypass (PR fix: bypass informer cache for ClowdApp read in CJI controller #1760): Used
APIReaderto read the ClowdApp directly from the API server instead of the informer cache. Did not solve the issue.Root cause: The ClowdApp and the migration CJI are objects in the same template, applied together in a single pipeline step. However, the CJI controller can reconcile the newly-created CJI before the ClowdApp update has been fully applied or propagated — resulting in the job using the previous image.
This approach attempts to solve it at the Clowder level by letting the CJI declare what image tag it expects via an annotation, and refusing to proceed until the ClowdApp's job image matches.