Skip to content

Conversation

@ANJANA-A-R-K
Copy link
Contributor

@ANJANA-A-R-K ANJANA-A-R-K commented Jan 8, 2026

The extension image contains the Kata RPM. Extension images vary with the OCP version, which means the Kata version can change across releases. To ensure correctness, we must verify that the Kata version specified in the YAML matches the version bundled in the extension image.

This change adds an explicit Kata version field to the controller YAML and validates it against the Kata RPM version present in the extension image.

  • Introduces a kataVersion field in the controller YAML.
  • During reconciliation, compares the YAML-defined Kata version with the Kata RPM version found in the extension image.
  • Fails reconciliation if the versions do not match
  • Relies on the existing ConfigMap reconcile loop (every 10 minutes) to re-check and pick up any updated ConfigMaps automatically.

Current behavior

  • The OSC DaemonSet always consumes the current extension image provided by the cluster.
  • When an OCP upgrade happens, the extension image may change.

If the addon artifact image is not updated and the OSC version remains the same, this can introduce a new Kata RPM version even though:

  • The OSC version has not changed
  • The addon artifact image was built and tested against a specific Kata version

As a result, when customers upgrade OCP but continue using an older addon artifact image, OSC may silently install a Kata version that:

  • Was never validated with the running OSC version
  • Does not match the Kata version expected by the addon artifact image

Upgrade scenario

  • OSC 1.11 is installed and tested with Kata 3.21
  • The addon artifact image includes Kata 3.21
  • If The cluster is upgraded from OCP 4.20.6 → 4.21.8 and the extension image now contains Kata 3.22 for example, OSC is not upgraded, and the addon artifact image is not updated
  • During reconciliation, OSC installs Kata 3.22 from the new extension image
  • This creates a version mismatch between the addon artifact image and the tested OSC stack.
    Currently, this mismatch happens silently, with no validation or guardrails.

This PR adds an explicit validation step to prevent unintended Kata version drift:

  • Adds a kataVersion field to the add-on YAML

During reconciliation:

  • Extracts the Kata RPM version from the extension image
  • Compares it against the YAML-defined kataVersion
  • Fails reconciliation if the versions do not match
  • Uses the existing ConfigMap reconciliation loop (every 10 minutes) to automatically re-check and apply updates when the ConfigMap is corrected

https://issues.redhat.com/browse/KATA-4557

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026
@openshift-ci openshift-ci bot requested review from gkurz and vvoronko January 8, 2026 10:07
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 8, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 8, 2026

Hi @ANJANA-A-R-K. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026
…nt in the extension image.

The extension image contains the Kata RPM. Extension images vary with the OCP version, which means the Kata version can change across releases, Kata version specified in the YAML must match the version bundled in the extension image.

Signed-off-by: ANJANA-A-R-K <[email protected]>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2026
@gkurz
Copy link
Member

gkurz commented Jan 15, 2026

After discussing with @ajaypvictor on slack, let's revisit the problem.

The current flow mandates the user to provide the addons image that contains both the guest environment and the IBM SE kernel. This includes the kata agent that should match the kata shim on the host, otherwise undefined and hard to debug behavior is likely to happen.

This is different from all other cases where OSC and the kata-containers RPM already ensure that kata is de facto the same.

My concern with this PR is that it adds an explicit version of kata that will need to be properly maintained just for this case and it doesn't even actually ensure that the user provided image is safe to use. The only way would be having the user to provide the kata version along with the addons image. The image format from genprotimg doesn't allow to add arbitrary metadata so this could be a sidecar json file, ideally created from the podvm binaries that went to the initrd.

Another option could be to revisit the design so that the user isn't responsible to provide the appropriate kata agent. Roughly, something like :

  • RPM for s390x is added a pre-built initrd like the other TEEs
  • OSC learns how to create the image
  • user just needs to pass the HPCC kernel

There are probably some challenges but it would definitely reduce the risk of inconsistency.

@ANJANA-A-R-K
Copy link
Contributor Author

ANJANA-A-R-K commented Jan 20, 2026

For the s390x case, the addon image will go with including a version file, for example:
version.json which contains,

{
"kata_version": "3.21.1"
}

This version.json is generated by the same build pipeline that produces the kata agent/initrd and the addon image, so it reflects the exact Kata version the image was built against.

Early in the OSC install flow (while installing extension image), if an addon image is present, OSC will extract just this version.json from the addon image and read the declared Kata version. It will then compare that value with the kata-containers RPM version installed on the host.

If the versions match, installation proceeds as today. If they don't, OSC fails early with a clear error instead of allowing a incompatible kata agent

For the s390x case, the addon image will include a version json which contains the kata version. Compare the version present in extension image with the version of kata in the add on image

Signed-off-by: ANJANA-A-R-K <[email protected]>
@gkurz
Copy link
Member

gkurz commented Jan 29, 2026

I see you kept the original patch and added another one on top to introduce the version JSON file.
This isn't the way to go for several reasons :

  • we want the commits in the PR to match what will be merged in the end and we certainly don't want to merge the first patch since we nacked it
  • second commit is based on the first one that we don't want so we're spending time reviewing code changes that won't get merged as is

For now, please squash the two commits into a single one. I'll start reviewing when it is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants