Skip to content

AGENT-1449: Add single-phase IRI registry credential rotation#5810

Open
rwsu wants to merge 4 commits intoopenshift:mainfrom
rwsu:AGENT-1449-auth-rotation-simple
Open

AGENT-1449: Add single-phase IRI registry credential rotation#5810
rwsu wants to merge 4 commits intoopenshift:mainfrom
rwsu:AGENT-1449-auth-rotation-simple

Conversation

@rwsu
Copy link

@rwsu rwsu commented Mar 26, 2026

- What I did

Implement credential rotation that accepts brief registry downtime.

When an admin updates iriAuthSecret.Data["password"], the controller:

  1. Detects the mismatch between password and htpasswd (via bcrypt compare)
  2. Generates a new bcrypt hash and updates iriAuthSecret.Data["htpasswd"]
  3. Re-renders the master MachineConfig with the new htpasswd
  4. Updates the global pull secret with the new credentials
  5. MCD rolls out the updated MC; brief downtime for IRI registry during
    rollout is accepted

Key changes:

  • Add NoneStatusAction for /etc/iri-registry/auth/htpasswd in NodeDisruptionPolicy
    (Distribution registry re-reads htpasswd on mtime change, no restart needed)
  • Add unit tests for helpers and reconcileAuthSecret
  • Add e2e tests: unauthenticated 401, authenticated 200, and full rotation flow
    (tests use ExecCmdOnNode via MCD pod to reach api-int:22625 in CI)

- How to verify it

Update the password to trigger the rotation to start:

oc -n openshift-machine-config-operator patch secret internal-release-image-registry-auth
--type merge -p '{"data":{"password":"'$(echo -n "new-password" | base64)'"}}'
Verify the /etc/iri-registry/auth/htpasswd has been updated.
Verify iri-registry works new credentials after rollout is complete.
Verify global pull-secret contains the new credentials after rollout is complete.

- Description for the changelog

Add credential rotation support for the IRI registry.

rwsu added 4 commits March 23, 2026 10:04
Add htpasswd-based authentication to the IRI registry. The installer
generates credentials and provides them via a bootstrap secret. The MCO
mounts the htpasswd file into the registry container and configures
registry auth environment variables. The registry password is merged
into the node pull secret so kubelet can authenticate when pulling
the release image.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
The IRI controller merges registry auth credentials into the global
pull secret after bootstrap. This triggers the template controller to
re-render template MCs (00-master, etc.) with the updated pull secret,
producing a different rendered MC hash than what bootstrap created.

The mismatch causes the MCD DaemonSet pod to fail during bootstrap:
it reads the bootstrap-rendered MC name from the node annotation, but
that MC no longer exists in-cluster (replaced by the re-rendered one).
The MCD falls back to reading /etc/machine-config-daemon/currentconfig,
which was never written because the firstboot MCD detected "no changes"
and skipped it. Both master nodes go Degraded and never recover.

Fix by merging IRI auth into the pull secret during bootstrap before
template MC rendering, so both bootstrap and in-cluster produce
identical rendered MC hashes.

Extract the pull secret merge logic into a shared MergeIRIAuthIntoPullSecret
function used by both the bootstrap path and the in-cluster IRI controller.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
- bootstrap.go: gate pull secret merge on iri != nil && iriAuthSecret != nil
  (restore the IRI CR check that was lost in the hash mismatch fix, drop the
  unnecessary cconfig.Spec.DNS != nil guard); treat merge failure as an error
- controller: return an error when the IRI auth secret is missing instead of
  silently ignoring it; auth is expected to always be present
- controller: remove the iriAuthSecret != nil guard around mergeIRIAuthIntoPullSecret
- mergeIRIAuthIntoPullSecret: error on empty password; remove DNS nil check
  (DNS is always set); error on pull secret not found
- load-registry-image.sh: always pass --authfile, kubelet config.json is
  always available by the time the remote pull fallback runs
- tests: assume auth is always present -- add iriAuthSecret/pullSecret/withDNS
  to all non-deletion test cases; collapse WithAuth variants into single
  functions; remove no-auth test case

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement credential rotation that accepts brief registry downtime.

When an admin updates iriAuthSecret.Data["password"], the controller:
1. Detects the mismatch between password and htpasswd (via bcrypt compare)
2. Generates a new bcrypt hash and updates iriAuthSecret.Data["htpasswd"]
3. Re-renders the master MachineConfig with the new htpasswd
4. Updates the global pull secret with the new credentials
5. MCD rolls out the updated MC; brief downtime for IRI registry during
   rollout is accepted

Key changes:
- Add NoneStatusAction for /etc/iri-registry/auth/htpasswd in NodeDisruptionPolicy
  (Distribution registry re-reads htpasswd on mtime change, no restart needed)
- Add unit tests for helpers and reconcileAuthSecret
- Add e2e tests: unauthenticated 401, authenticated 200, and full rotation flow
  (tests use ExecCmdOnNode via MCD pod to reach api-int:22625 in CI)

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 26, 2026

Walkthrough

This PR implements InternalReleaseImage (IRI) registry authentication via htpasswd credentials. It introduces credential generation, reconciliation, and distribution across cluster nodes through pull secret merging, systemd unit configuration, and corresponding e2e tests.

Changes

Cohort / File(s) Summary
API and Node Disruption Policy
pkg/apihelpers/apihelpers.go, pkg/controller/common/constants.go
Added IRI htpasswd file to default cluster node disruption policy; introduced new exported constant InternalReleaseImageAuthSecretName for secret identification.
Bootstrap Integration
pkg/controller/bootstrap/bootstrap.go, pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go, pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
Detects IRI auth secret during bootstrap and passes it through the release image initialization pipeline; updated function signatures and tests to accommodate the additional credential parameter.
IRI Controller and Reconciliation
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
Extended controller with kubeClient field; added secret filtering for auth changes; introduced reconcileAuthSecret() and mergeIRIAuthIntoPullSecret() methods to synchronize htpasswd entries and update global pull secrets with IRI registry credentials.
Renderer and Template Support
pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go, pkg/controller/internalreleaseimage/templates/master/.../*
Updated renderer to accept and expose IRI auth secret; added new htpasswd template file; updated registry service systemd unit to conditionally mount auth directory and configure htpasswd realm when credentials are present; modified podman pull invocation to use authfile.
Pull Secret and Htpasswd Utilities
pkg/controller/internalreleaseimage/pullsecret.go, pkg/controller/internalreleaseimage/pullsecret_test.go
New module providing functions for htpasswd entry generation/validation and merging IRI registry credentials into dockerconfig JSON pull secrets; includes comprehensive unit tests for all control paths.
Test Helpers and Controller Tests
pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go, pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
Added test utilities for auth secret creation, htpasswd validation, and pull secret verification; extended controller test suite with cases for auth secret reconciliation and pull secret merging; added TestReconcileAuthSecret with multiple scenarios.
E2E Authentication Tests
test/e2e-iri/iri_test.go
New e2e test suite validating IRI registry authentication: unauthenticated requests return 401, authenticated requests succeed, and credential rotation workflow updates both secrets and service configurations correctly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@rwsu rwsu changed the title Agent 1449 auth rotation simple AGENT-1449: Add single-phase IRI registry credential rotation Mar 26, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 26, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 26, 2026

@rwsu: This pull request references AGENT-1449 which is a valid jira issue.

Details

In response to this:

- What I did

Implement credential rotation that accepts brief registry downtime.

When an admin updates iriAuthSecret.Data["password"], the controller:

  1. Detects the mismatch between password and htpasswd (via bcrypt compare)
  2. Generates a new bcrypt hash and updates iriAuthSecret.Data["htpasswd"]
  3. Re-renders the master MachineConfig with the new htpasswd
  4. Updates the global pull secret with the new credentials
  5. MCD rolls out the updated MC; brief downtime for IRI registry during
    rollout is accepted

Key changes:

  • Add NoneStatusAction for /etc/iri-registry/auth/htpasswd in NodeDisruptionPolicy
    (Distribution registry re-reads htpasswd on mtime change, no restart needed)
  • Add unit tests for helpers and reconcileAuthSecret
  • Add e2e tests: unauthenticated 401, authenticated 200, and full rotation flow
    (tests use ExecCmdOnNode via MCD pod to reach api-int:22625 in CI)

- How to verify it

Update the password to trigger the rotation to start:

oc -n openshift-machine-config-operator patch secret internal-release-image-registry-auth
--type merge -p '{"data":{"password":"'$(echo -n "new-password" | base64)'"}}'
Verify the /etc/iri-registry/auth/htpasswd has been updated.
Verify iri-registry works new credentials after rollout is complete.
Verify global pull-secret contains the new credentials after rollout is complete.

- Description for the changelog

Add credential rotation support for the IRI registry.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rwsu
Once this PR has been reviewed and has the lgtm label, please assign pablintino for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 478-510: The code in mergeIRIAuthIntoPullSecret reads
cconfig.Spec.DNS.Spec.BaseDomain without guarding for a nil DNS pointer; add a
nil check at the start of mergeIRIAuthIntoPullSecret to handle missing DNS
(e.g., if cconfig.Spec.DNS == nil) and return a clear error (or fallback
behavior) instead of dereferencing; update references to baseDomain to use the
validated value so the function never panics when ControllerConfig has no DNS
configured.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go`:
- Around line 121-122: r.iriAuthSecret may be nil but the code unconditionally
reads r.iriAuthSecret.Data["htpasswd"] into iriHtpasswd; add a nil guard before
that access (e.g., check if r.iriAuthSecret != nil) and handle the nil case
explicitly — either return an error from the renderer function or set
iriHtpasswd to a safe default and log/propagate the missing secret; update the
struct comment only if you change the invariant to make iriAuthSecret required.
Ensure you modify the code paths that use iriHtpasswd to handle the new
nil/empty case consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33ab74c5-20ae-4aee-8f24-35db4c8537b4

📥 Commits

Reviewing files that changed from the base of the PR and between 33e9c33 and 5bf9b3c.

⛔ Files ignored due to path filters (3)
  • vendor/golang.org/x/crypto/bcrypt/base64.go is excluded by !vendor/**, !**/vendor/**
  • vendor/golang.org/x/crypto/bcrypt/bcrypt.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (15)
  • pkg/apihelpers/apihelpers.go
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/pullsecret_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
  • test/e2e-iri/iri_test.go

Comment on lines +478 to +510
func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error {
password := string(authSecret.Data["password"])
if password == "" {
return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name)
}

baseDomain := cconfig.Spec.DNS.Spec.BaseDomain

// Fetch current pull secret from openshift-config
pullSecret, err := ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get(
context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{})
if err != nil {
return fmt.Errorf("could not get pull-secret: %w", err)
}

mergedBytes, err := MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain)
if err != nil {
return err
}

// No change needed
if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) {
return nil
}

pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes
_, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update(
context.TODO(), pullSecret, metav1.UpdateOptions{})
if err == nil {
klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion)
}
return err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential nil pointer dereference on cconfig.Spec.DNS.

Line 484 accesses cconfig.Spec.DNS.Spec.BaseDomain without checking if DNS is nil. If the ControllerConfig doesn't have DNS configured, this will panic.

🛡️ Proposed fix: add nil guard
 func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error {
 	password := string(authSecret.Data["password"])
 	if password == "" {
 		return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name)
 	}

+	if cconfig.Spec.DNS == nil {
+		return fmt.Errorf("ControllerConfig DNS not configured, cannot determine IRI registry host")
+	}
 	baseDomain := cconfig.Spec.DNS.Spec.BaseDomain
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error {
password := string(authSecret.Data["password"])
if password == "" {
return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name)
}
baseDomain := cconfig.Spec.DNS.Spec.BaseDomain
// Fetch current pull secret from openshift-config
pullSecret, err := ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get(
context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{})
if err != nil {
return fmt.Errorf("could not get pull-secret: %w", err)
}
mergedBytes, err := MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain)
if err != nil {
return err
}
// No change needed
if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) {
return nil
}
pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes
_, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update(
context.TODO(), pullSecret, metav1.UpdateOptions{})
if err == nil {
klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion)
}
return err
}
func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error {
password := string(authSecret.Data["password"])
if password == "" {
return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name)
}
if cconfig.Spec.DNS == nil {
return fmt.Errorf("ControllerConfig DNS not configured, cannot determine IRI registry host")
}
baseDomain := cconfig.Spec.DNS.Spec.BaseDomain
// Fetch current pull secret from openshift-config
pullSecret, err := ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get(
context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{})
if err != nil {
return fmt.Errorf("could not get pull-secret: %w", err)
}
mergedBytes, err := MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain)
if err != nil {
return err
}
// No change needed
if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) {
return nil
}
pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes
_, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update(
context.TODO(), pullSecret, metav1.UpdateOptions{})
if err == nil {
klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion)
}
return err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 478 - 510, The code in mergeIRIAuthIntoPullSecret reads
cconfig.Spec.DNS.Spec.BaseDomain without guarding for a nil DNS pointer; add a
nil check at the start of mergeIRIAuthIntoPullSecret to handle missing DNS
(e.g., if cconfig.Spec.DNS == nil) and return a clear error (or fallback
behavior) instead of dereferencing; update references to baseDomain to use the
validated value so the function never panics when ControllerConfig has no DNS
configured.

Comment on lines +121 to +122
iriHtpasswd := string(r.iriAuthSecret.Data["htpasswd"])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for calls to NewRendererByRole to see if nil is ever passed
rg -n -A3 'NewRendererByRole\(' --type go

Repository: openshift/machine-config-operator

Length of output: 1507


🏁 Script executed:

# Check bootstrap.go to see how iriAuthSecret is obtained and passed
sed -n '1,25p' pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go

# Check controller.go around line 362 to see how iriAuthSecret is obtained
sed -n '340,375p' pkg/controller/internalreleaseimage/internalreleaseimage_controller.go

Repository: openshift/machine-config-operator

Length of output: 2329


Nil pointer dereference risk if iriAuthSecret is nil.

Line 43 documents iriAuthSecret as "may be nil", but line 121 unconditionally accesses r.iriAuthSecret.Data["htpasswd"]. The bootstrap code path can pass nil for iriAuthSecret, which would cause a panic at this line.

Either add a nil guard here, or update the struct comment to indicate the secret is required (not optional).

🛡️ Proposed fix: add nil guard
+	var iriHtpasswd string
+	if r.iriAuthSecret != nil {
+		iriHtpasswd = string(r.iriAuthSecret.Data["htpasswd"])
+	}
-	iriHtpasswd := string(r.iriAuthSecret.Data["htpasswd"])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
iriHtpasswd := string(r.iriAuthSecret.Data["htpasswd"])
var iriHtpasswd string
if r.iriAuthSecret != nil {
iriHtpasswd = string(r.iriAuthSecret.Data["htpasswd"])
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go` around
lines 121 - 122, r.iriAuthSecret may be nil but the code unconditionally reads
r.iriAuthSecret.Data["htpasswd"] into iriHtpasswd; add a nil guard before that
access (e.g., check if r.iriAuthSecret != nil) and handle the nil case
explicitly — either return an error from the renderer function or set
iriHtpasswd to a safe default and log/propagate the missing secret; update the
struct comment only if you change the invariant to make iriAuthSecret required.
Ensure you modify the code paths that use iriHtpasswd to handle the new
nil/empty case consistently.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 26, 2026

@rwsu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-single-node 5bf9b3c link true /test e2e-gcp-op-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants