Skip to content

feat(cji): add expected-image-tag annotation to wait for correct app image#1779

Open
rodrigonull wants to merge 1 commit into
RedHatInsights:masterfrom
rodrigonull:fix/cji-expected-image-tag
Open

feat(cji): add expected-image-tag annotation to wait for correct app image#1779
rodrigonull wants to merge 1 commit into
RedHatInsights:masterfrom
rodrigonull:fix/cji-expected-image-tag

Conversation

@rodrigonull
Copy link
Copy Markdown
Member

@rodrigonull rodrigonull commented May 21, 2026

Summary

Adds support for a clowder.redhat.com/expected-image-tag annotation on ClowdJobInvocation resources. When present, the CJI controller verifies that the ClowdApp's job images contain the expected tag before creating the Job. If the images don't match, the CJI is requeued until the ClowdApp is updated.

Problem

When deploying via App-Interface, the ClowdJobInvocation and ClowdApp are applied as separate resourceTemplates. There is no guaranteed ordering between them, which causes a race condition: the CJI can be reconciled before the ClowdApp has been updated with the new image, resulting in the migration job running with an outdated container image.

Solution

The deployment pipeline (e.g., App-Interface saas-deploy) can now annotate the CJI with the expected image tag:

metadata:
  annotations:
    clowder.redhat.com/expected-image-tag: "<commit-sha-or-tag>"

The controller will:

  1. Check if the annotation is present on the CJI
  2. Verify that the ClowdApp's job images end with :<expected-tag>
  3. If they don't match, emit a Warning event (ImageTagMismatch), set ReconciliationFailed condition, and requeue
  4. Once the ClowdApp is updated with the correct image, reconciliation proceeds normally

The CJI will requeue indefinitely until the image matches — this is intentional, as running with the wrong image is worse than not running at all.

Changes

  • clowdjobinvocation_controller.go: Added annotation constant, image tag check logic in Reconcile(), and helper function appJobImagesContainTag()
  • clowdjobinvocation_controller_test.go: Added unit tests covering mismatch (requeue), match (proceed), and no-annotation (skip) scenarios

Usage

In the saas-deploy file, add the annotation to the CJI resource template:

- apiVersion: cloud.redhat.com/v1alpha1
  kind: ClowdJobInvocation
  metadata:
    annotations:
      clowder.redhat.com/expected-image-tag: ${IMAGE_TAG}
    labels:
      app: host-inventory
    name: run-db-migrations-${IMAGE_TAG}
  spec:
    appName: host-inventory
    jobs:
      - run-db-migrations
    runOnNotReady: true

Test Plan

  • Unit tests for image tag mismatch (requeue behavior)
  • Unit tests for image tag match (proceeds to create job)
  • Unit tests for missing annotation (skips check entirely)

Notes

This is a third approach to solving the race condition where the DB migration CJI runs with an outdated image. Previous attempts:

  1. Generation check (PR fix: prevent CJI with runOnNotReady from using stale job image #1742): Added metadata.generation / status.generation comparison to detect stale ClowdApp state. Did not solve the issue.

  2. APIReader bypass (PR fix: bypass informer cache for ClowdApp read in CJI controller #1760): Used APIReader to read the ClowdApp directly from the API server instead of the informer cache. Did not solve the issue.

Root cause: The ClowdApp and the migration CJI are objects in the same template, applied together in a single pipeline step. However, the CJI controller can reconcile the newly-created CJI before the ClowdApp update has been fully applied or propagated — resulting in the job using the previous image.

This approach attempts to solve it at the Clowder level by letting the CJI declare what image tag it expects via an annotation, and refusing to proceed until the ClowdApp's job image matches.

…image

When a ClowdJobInvocation carries the annotation
clowder.redhat.com/expected-image-tag, the controller verifies that the
ClowdApp's job images contain the expected tag before proceeding. If the
images don't match, the CJI is requeued until the ClowdApp is updated
by the deployment pipeline.

This solves a race condition where the CJI is applied before the
ClowdApp has been updated with the new image, causing the job to run
with an outdated container image.
@rodrigonull rodrigonull force-pushed the fix/cji-expected-image-tag branch from a11f9f3 to 62f52d5 Compare May 21, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant