AGENT-1449: Add single-phase IRI registry credential rotation#5810
AGENT-1449: Add single-phase IRI registry credential rotation#5810rwsu wants to merge 4 commits intoopenshift:mainfrom
Conversation
Add htpasswd-based authentication to the IRI registry. The installer generates credentials and provides them via a bootstrap secret. The MCO mounts the htpasswd file into the registry container and configures registry auth environment variables. The registry password is merged into the node pull secret so kubelet can authenticate when pulling the release image. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
The IRI controller merges registry auth credentials into the global pull secret after bootstrap. This triggers the template controller to re-render template MCs (00-master, etc.) with the updated pull secret, producing a different rendered MC hash than what bootstrap created. The mismatch causes the MCD DaemonSet pod to fail during bootstrap: it reads the bootstrap-rendered MC name from the node annotation, but that MC no longer exists in-cluster (replaced by the re-rendered one). The MCD falls back to reading /etc/machine-config-daemon/currentconfig, which was never written because the firstboot MCD detected "no changes" and skipped it. Both master nodes go Degraded and never recover. Fix by merging IRI auth into the pull secret during bootstrap before template MC rendering, so both bootstrap and in-cluster produce identical rendered MC hashes. Extract the pull secret merge logic into a shared MergeIRIAuthIntoPullSecret function used by both the bootstrap path and the in-cluster IRI controller. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
- bootstrap.go: gate pull secret merge on iri != nil && iriAuthSecret != nil (restore the IRI CR check that was lost in the hash mismatch fix, drop the unnecessary cconfig.Spec.DNS != nil guard); treat merge failure as an error - controller: return an error when the IRI auth secret is missing instead of silently ignoring it; auth is expected to always be present - controller: remove the iriAuthSecret != nil guard around mergeIRIAuthIntoPullSecret - mergeIRIAuthIntoPullSecret: error on empty password; remove DNS nil check (DNS is always set); error on pull secret not found - load-registry-image.sh: always pass --authfile, kubelet config.json is always available by the time the remote pull fallback runs - tests: assume auth is always present -- add iriAuthSecret/pullSecret/withDNS to all non-deletion test cases; collapse WithAuth variants into single functions; remove no-auth test case Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement credential rotation that accepts brief registry downtime. When an admin updates iriAuthSecret.Data["password"], the controller: 1. Detects the mismatch between password and htpasswd (via bcrypt compare) 2. Generates a new bcrypt hash and updates iriAuthSecret.Data["htpasswd"] 3. Re-renders the master MachineConfig with the new htpasswd 4. Updates the global pull secret with the new credentials 5. MCD rolls out the updated MC; brief downtime for IRI registry during rollout is accepted Key changes: - Add NoneStatusAction for /etc/iri-registry/auth/htpasswd in NodeDisruptionPolicy (Distribution registry re-reads htpasswd on mtime change, no restart needed) - Add unit tests for helpers and reconcileAuthSecret - Add e2e tests: unauthenticated 401, authenticated 200, and full rotation flow (tests use ExecCmdOnNode via MCD pod to reach api-int:22625 in CI) Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughThis PR implements InternalReleaseImage (IRI) registry authentication via htpasswd credentials. It introduces credential generation, reconciliation, and distribution across cluster nodes through pull secret merging, systemd unit configuration, and corresponding e2e tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@rwsu: This pull request references AGENT-1449 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rwsu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 478-510: The code in mergeIRIAuthIntoPullSecret reads
cconfig.Spec.DNS.Spec.BaseDomain without guarding for a nil DNS pointer; add a
nil check at the start of mergeIRIAuthIntoPullSecret to handle missing DNS
(e.g., if cconfig.Spec.DNS == nil) and return a clear error (or fallback
behavior) instead of dereferencing; update references to baseDomain to use the
validated value so the function never panics when ControllerConfig has no DNS
configured.
In `@pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go`:
- Around line 121-122: r.iriAuthSecret may be nil but the code unconditionally
reads r.iriAuthSecret.Data["htpasswd"] into iriHtpasswd; add a nil guard before
that access (e.g., check if r.iriAuthSecret != nil) and handle the nil case
explicitly — either return an error from the renderer function or set
iriHtpasswd to a safe default and log/propagate the missing secret; update the
struct comment only if you change the invariant to make iriAuthSecret required.
Ensure you modify the code paths that use iriHtpasswd to handle the new
nil/empty case consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 33ab74c5-20ae-4aee-8f24-35db4c8537b4
⛔ Files ignored due to path filters (3)
vendor/golang.org/x/crypto/bcrypt/base64.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/bcrypt/bcrypt.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (15)
pkg/apihelpers/apihelpers.gopkg/controller/bootstrap/bootstrap.gopkg/controller/common/constants.gopkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.gopkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.gopkg/controller/internalreleaseimage/internalreleaseimage_controller.gopkg/controller/internalreleaseimage/internalreleaseimage_controller_test.gopkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.gopkg/controller/internalreleaseimage/internalreleaseimage_renderer.gopkg/controller/internalreleaseimage/pullsecret.gopkg/controller/internalreleaseimage/pullsecret_test.gopkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yamlpkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yamlpkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yamltest/e2e-iri/iri_test.go
| func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error { | ||
| password := string(authSecret.Data["password"]) | ||
| if password == "" { | ||
| return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name) | ||
| } | ||
|
|
||
| baseDomain := cconfig.Spec.DNS.Spec.BaseDomain | ||
|
|
||
| // Fetch current pull secret from openshift-config | ||
| pullSecret, err := ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get( | ||
| context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return fmt.Errorf("could not get pull-secret: %w", err) | ||
| } | ||
|
|
||
| mergedBytes, err := MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // No change needed | ||
| if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) { | ||
| return nil | ||
| } | ||
|
|
||
| pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes | ||
| _, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update( | ||
| context.TODO(), pullSecret, metav1.UpdateOptions{}) | ||
| if err == nil { | ||
| klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion) | ||
| } | ||
| return err | ||
| } |
There was a problem hiding this comment.
Potential nil pointer dereference on cconfig.Spec.DNS.
Line 484 accesses cconfig.Spec.DNS.Spec.BaseDomain without checking if DNS is nil. If the ControllerConfig doesn't have DNS configured, this will panic.
🛡️ Proposed fix: add nil guard
func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error {
password := string(authSecret.Data["password"])
if password == "" {
return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name)
}
+ if cconfig.Spec.DNS == nil {
+ return fmt.Errorf("ControllerConfig DNS not configured, cannot determine IRI registry host")
+ }
baseDomain := cconfig.Spec.DNS.Spec.BaseDomain📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error { | |
| password := string(authSecret.Data["password"]) | |
| if password == "" { | |
| return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name) | |
| } | |
| baseDomain := cconfig.Spec.DNS.Spec.BaseDomain | |
| // Fetch current pull secret from openshift-config | |
| pullSecret, err := ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get( | |
| context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{}) | |
| if err != nil { | |
| return fmt.Errorf("could not get pull-secret: %w", err) | |
| } | |
| mergedBytes, err := MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain) | |
| if err != nil { | |
| return err | |
| } | |
| // No change needed | |
| if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) { | |
| return nil | |
| } | |
| pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes | |
| _, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update( | |
| context.TODO(), pullSecret, metav1.UpdateOptions{}) | |
| if err == nil { | |
| klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion) | |
| } | |
| return err | |
| } | |
| func (ctrl *Controller) mergeIRIAuthIntoPullSecret(cconfig *mcfgv1.ControllerConfig, authSecret *corev1.Secret) error { | |
| password := string(authSecret.Data["password"]) | |
| if password == "" { | |
| return fmt.Errorf("IRI auth secret %s/%s has empty password", authSecret.Namespace, authSecret.Name) | |
| } | |
| if cconfig.Spec.DNS == nil { | |
| return fmt.Errorf("ControllerConfig DNS not configured, cannot determine IRI registry host") | |
| } | |
| baseDomain := cconfig.Spec.DNS.Spec.BaseDomain | |
| // Fetch current pull secret from openshift-config | |
| pullSecret, err := ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get( | |
| context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{}) | |
| if err != nil { | |
| return fmt.Errorf("could not get pull-secret: %w", err) | |
| } | |
| mergedBytes, err := MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain) | |
| if err != nil { | |
| return err | |
| } | |
| // No change needed | |
| if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) { | |
| return nil | |
| } | |
| pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes | |
| _, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update( | |
| context.TODO(), pullSecret, metav1.UpdateOptions{}) | |
| if err == nil { | |
| klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion) | |
| } | |
| return err | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 478 - 510, The code in mergeIRIAuthIntoPullSecret reads
cconfig.Spec.DNS.Spec.BaseDomain without guarding for a nil DNS pointer; add a
nil check at the start of mergeIRIAuthIntoPullSecret to handle missing DNS
(e.g., if cconfig.Spec.DNS == nil) and return a clear error (or fallback
behavior) instead of dereferencing; update references to baseDomain to use the
validated value so the function never panics when ControllerConfig has no DNS
configured.
| iriHtpasswd := string(r.iriAuthSecret.Data["htpasswd"]) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for calls to NewRendererByRole to see if nil is ever passed
rg -n -A3 'NewRendererByRole\(' --type goRepository: openshift/machine-config-operator
Length of output: 1507
🏁 Script executed:
# Check bootstrap.go to see how iriAuthSecret is obtained and passed
sed -n '1,25p' pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
# Check controller.go around line 362 to see how iriAuthSecret is obtained
sed -n '340,375p' pkg/controller/internalreleaseimage/internalreleaseimage_controller.goRepository: openshift/machine-config-operator
Length of output: 2329
Nil pointer dereference risk if iriAuthSecret is nil.
Line 43 documents iriAuthSecret as "may be nil", but line 121 unconditionally accesses r.iriAuthSecret.Data["htpasswd"]. The bootstrap code path can pass nil for iriAuthSecret, which would cause a panic at this line.
Either add a nil guard here, or update the struct comment to indicate the secret is required (not optional).
🛡️ Proposed fix: add nil guard
+ var iriHtpasswd string
+ if r.iriAuthSecret != nil {
+ iriHtpasswd = string(r.iriAuthSecret.Data["htpasswd"])
+ }
- iriHtpasswd := string(r.iriAuthSecret.Data["htpasswd"])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| iriHtpasswd := string(r.iriAuthSecret.Data["htpasswd"]) | |
| var iriHtpasswd string | |
| if r.iriAuthSecret != nil { | |
| iriHtpasswd = string(r.iriAuthSecret.Data["htpasswd"]) | |
| } | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go` around
lines 121 - 122, r.iriAuthSecret may be nil but the code unconditionally reads
r.iriAuthSecret.Data["htpasswd"] into iriHtpasswd; add a nil guard before that
access (e.g., check if r.iriAuthSecret != nil) and handle the nil case
explicitly — either return an error from the renderer function or set
iriHtpasswd to a safe default and log/propagate the missing secret; update the
struct comment only if you change the invariant to make iriAuthSecret required.
Ensure you modify the code paths that use iriHtpasswd to handle the new
nil/empty case consistently.
|
@rwsu: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
Implement credential rotation that accepts brief registry downtime.
When an admin updates iriAuthSecret.Data["password"], the controller:
rollout is accepted
Key changes:
(Distribution registry re-reads htpasswd on mtime change, no restart needed)
(tests use ExecCmdOnNode via MCD pod to reach api-int:22625 in CI)
- How to verify it
Update the password to trigger the rotation to start:
oc -n openshift-machine-config-operator patch secret internal-release-image-registry-auth
--type merge -p '{"data":{"password":"'$(echo -n "new-password" | base64)'"}}'
Verify the /etc/iri-registry/auth/htpasswd has been updated.
Verify iri-registry works new credentials after rollout is complete.
Verify global pull-secret contains the new credentials after rollout is complete.
- Description for the changelog
Add credential rotation support for the IRI registry.