Skip to content

AGENT-1449: Add IRI registry authentication support to MCO#5765

Open
rwsu wants to merge 3 commits intoopenshift:mainfrom
rwsu:AGENT-1449-auth
Open

AGENT-1449: Add IRI registry authentication support to MCO#5765
rwsu wants to merge 3 commits intoopenshift:mainfrom
rwsu:AGENT-1449-auth

Conversation

@rwsu
Copy link

@rwsu rwsu commented Mar 13, 2026

*- What I did

Add htpasswd-based authentication to the IRI registry. The installer generates credentials and provides them via a bootstrap secret. The MCO mounts the htpasswd file into the registry container and configures registry auth environment variables. The registry password is merged into the node pull secret so kubelet can authenticate when pulling the release image.

- How to verify it

  • Verify the IRI registry rejects requests without providing credentials

- Description for the changelog

Add htpasswd authentication to the Internal Release Image (IRI) registry. The registry now requires Basic Auth credentials, with the password stored in the internal-release-image-registry-auth secret and automatically merged into the global pull secret so kubelet can authenticate when pulling images. The master MachineConfig is updated to configure the registry's htpasswd file and enable auth environment variables in the systemd unit.

@openshift-ci-robot
Copy link
Contributor

@rwsu: An error was encountered searching for bug AGENT-1449 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/AGENT-1449": GET https://issues.redhat.com/rest/api/2/issue/AGENT-1449 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

*- What I did

Add htpasswd-based authentication to the IRI registry. The installer generates credentials and provides them via a bootstrap secret. The MCO mounts the htpasswd file into the registry container and configures registry auth environment variables. The registry password is merged into the node pull secret so kubelet can authenticate when pulling the release image.

- How to verify it

  • Verify the IRI registry rejects requests without providing credentials

- Description for the changelog

Add htpasswd authentication to the Internal Release Image (IRI) registry. The registry now requires Basic Auth credentials, with the password stored in the internal-release-image-registry-auth secret and automatically merged into the global pull secret so kubelet can authenticate when pulling images. The master MachineConfig is updated to configure the registry's htpasswd file and enable auth environment variables in the systemd unit.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Tracks an Internal Release Image auth Secret through bootstrap, controller, and renderer; exposes htpasswd to templates; merges IRI registry credentials into the cluster pull secret when appropriate; updates templates and adds unit tests for auth handling and pull-secret merging.

Changes

Cohort / File(s) Summary
Bootstrap & entrypoints
pkg/controller/bootstrap/bootstrap.go, pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
Capture InternalReleaseImageAuthSecret during bootstrap, conditionally merge IRI auth into bootstrap pull secret, and pass iriAuthSecret into RunInternalReleaseImageBootstrap.
Controller core & helper
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
Controller stores kubeClient; watches InternalReleaseImageAuthSecretName; reads optional auth Secret during sync, threads it into renderers, and merges IRI auth into the global pull secret via new mergeIRIAuthIntoPullSecret.
Renderer & context
pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
Renderer gains iriAuthSecret *corev1.Secret and IriHtpasswd in render context; NewRendererByRole signature updated to accept iriAuthSecret; htpasswd is read into context for templates.
Pull secret logic & tests
pkg/controller/internalreleaseimage/pullsecret.go, pkg/controller/internalreleaseimage/pullsecret_test.go
Add MergeIRIAuthIntoPullSecret to merge/replace IRI registry auth in dockerconfigjson; comprehensive tests for add/update/no-op and invalid-input cases.
Constants
pkg/controller/common/constants.go
Add exported constant InternalReleaseImageAuthSecretName = "internal-release-image-registry-auth".
Unit tests — internalreleaseimage
pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go, pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go, pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
Tests updated to pass and verify iriAuthSecret, assert MachineConfigs include htpasswd file and script changes, add pullSecret() and iriAuthSecret() test helpers, and add ControllerConfig DNS helper.
Templates & units
pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml, pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml, pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
Add htpasswd file template; make load-registry-image script pass --authfile and expand image var; conditionally mount auth dir and set registry auth env vars in the unit when htpasswd present.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from bfournie and umohnani8 March 13, 2026 21:45
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rwsu
Once this PR has been reviewed and has the lgtm label, please assign yuqi-zhang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go (1)

378-383: Consider whether merge failure should be fatal.

If the IRI auth secret exists, it indicates auth is configured for the registry. When merging credentials into the pull secret fails, nodes may be unable to pull from the authenticated registry. The current warning-only approach could lead to silent authentication failures at runtime.

Consider whether this should return an error to trigger a retry, or at least emit an event for visibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 378 - 383, Currently mergeIRIAuthIntoPullSecret failure is only
logged with klog.Warningf which can hide critical pull-auth problems; change
this to return the error from the surrounding reconcile path so the controller
retries (i.e., replace the klog.Warningf branch with a return fmt.Errorf(...) or
wrapped err from the Reconcile method when
ctrl.mergeIRIAuthIntoPullSecret(cconfig, iriAuthSecret) fails), and additionally
emit a Kubernetes event for visibility using the controller's event recorder
(e.g., ctrl.recorder.Eventf or ctrl.eventRecorder.Eventf) mentioning the merge
failure and the secret name (iriAuthSecret) so operators see the issue in
events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/internalreleaseimage/pullsecret.go`:
- Around line 21-31: The code currently constructs iriRegistryHost using
baseDomain without validation, which allows empty/whitespace domains (producing
"api-int.:22625"); before creating iriRegistryHost validate baseDomain (e.g.,
use strings.TrimSpace(baseDomain) and check for empty), and if invalid return an
error (e.g., fmt.Errorf("empty baseDomain") ) so the function fails fast; update
the section that defines iriRegistryHost to perform this check and only build
iriRegistryHost after validation.

In
`@pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml`:
- Line 40: The podman pull call unconditionally uses --authfile
/var/lib/kubelet/config.json which fails if that file is missing; change the
logic around the podman pull with registryImage so you first test for the
authfile's existence (e.g. [ -f /var/lib/kubelet/config.json ] and non-empty)
and only add --authfile /var/lib/kubelet/config.json when present, otherwise
perform an unauthenticated podman pull "${registryImage}" fallback; ensure the
conditional preserves current error handling and quoting of registryImage.

---

Nitpick comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 378-383: Currently mergeIRIAuthIntoPullSecret failure is only
logged with klog.Warningf which can hide critical pull-auth problems; change
this to return the error from the surrounding reconcile path so the controller
retries (i.e., replace the klog.Warningf branch with a return fmt.Errorf(...) or
wrapped err from the Reconcile method when
ctrl.mergeIRIAuthIntoPullSecret(cconfig, iriAuthSecret) fails), and additionally
emit a Kubernetes event for visibility using the controller's event recorder
(e.g., ctrl.recorder.Eventf or ctrl.eventRecorder.Eventf) mentioning the merge
failure and the secret name (iriAuthSecret) so operators see the issue in
events.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 28340cd8-6caf-4ee6-8c24-9145c611edd0

📥 Commits

Reviewing files that changed from the base of the PR and between 5f0d9d7 and a5a65dc.

📒 Files selected for processing (13)
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/pullsecret_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml

@rwsu
Copy link
Author

rwsu commented Mar 13, 2026

/cc @andfasano

@openshift-ci openshift-ci bot requested a review from andfasano March 13, 2026 22:07
@rwsu rwsu force-pushed the AGENT-1449-auth branch from a5a65dc to 77df1fb Compare March 16, 2026 21:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go (1)

348-350: Silent error suppression is intentional but consider logging.

The comment explains this is for upgrade compatibility, but silently discarding errors could mask issues beyond "not found" (e.g., permission errors). Consider at least logging unexpected errors.

💡 Suggested improvement
 	// Auth secret may not exist during upgrades from non-auth clusters
-	iriAuthSecret, _ := ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
+	iriAuthSecret, err := ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
+	if err != nil && !errors.IsNotFound(err) {
+		klog.V(4).Infof("Could not get IRI auth secret: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 348 - 350, The code currently discards the error returned by
ctrl.secretLister.Secrets(...).Get when retrieving iriAuthSecret; change it to
capture the error (e.g., iriAuthSecret, err :=
ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
) and if err != nil and the error is not a NotFound (use
apierrors.IsNotFound(err)) log the unexpected error with the controller logger
(e.g., ctrl.Log or the controller's logger instance) including context like
"getting InternalReleaseImageAuthSecret" and the err; keep the existing behavior
of silently continuing on NotFound to preserve upgrade compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 348-350: The code currently discards the error returned by
ctrl.secretLister.Secrets(...).Get when retrieving iriAuthSecret; change it to
capture the error (e.g., iriAuthSecret, err :=
ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
) and if err != nil and the error is not a NotFound (use
apierrors.IsNotFound(err)) log the unexpected error with the controller logger
(e.g., ctrl.Log or the controller's logger instance) including context like
"getting InternalReleaseImageAuthSecret" and the err; keep the existing behavior
of silently continuing on NotFound to preserve upgrade compatibility.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3bcaa665-1174-478d-bf7c-2e0eae80cbf1

📥 Commits

Reviewing files that changed from the base of the PR and between a5a65dc and 77df1fb.

📒 Files selected for processing (13)
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/pullsecret_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/pullsecret_test.go

@rwsu
Copy link
Author

rwsu commented Mar 16, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 16, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 16, 2026

@rwsu: This pull request references AGENT-1449 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rwsu rwsu force-pushed the AGENT-1449-auth branch from 77df1fb to 1498ed4 Compare March 17, 2026 03:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/controller/internalreleaseimage/pullsecret_test.go (1)

53-66: Consider adding a test case for empty baseDomain.

The MergeIRIAuthIntoPullSecret function returns an error when baseDomain is empty (see pullsecret.go lines 21-23), but this behavior isn't tested. Adding a test case would ensure this validation remains intact.

Suggested test case
 		{
 			name:        "missing auths field returns error",
 			pullSecret:  `{"registry":"quay.io"}`,
 			password:    "testpassword",
 			baseDomain:  "example.com",
 			expectError: true,
 		},
+		{
+			name:        "empty baseDomain returns error",
+			pullSecret:  basePullSecret,
+			password:    "testpassword",
+			baseDomain:  "",
+			expectError: true,
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/pullsecret_test.go` around lines 53 - 66,
Add a unit test in pullsecret_test.go that verifies MergeIRIAuthIntoPullSecret
returns an error when baseDomain is an empty string; locate the test table in
the existing tests (near the cases named "invalid JSON returns error" and
"missing auths field returns error") and add a new case with name like "empty
baseDomain returns error", pullSecret set to a valid JSON auth object, password
set to a non-empty value, baseDomain set to "", and expectError true to assert
the function MergeIRIAuthIntoPullSecret enforces the non-empty baseDomain
validation.
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go (1)

348-350: Consider distinguishing NotFound from unexpected errors when fetching auth secret.

Line 349 silently discards all errors, which is intentional for upgrade compatibility (auth secret may not exist). However, this also masks unexpected errors (e.g., network issues, RBAC problems) that might warrant logging or different handling.

Suggested improvement
 	// Auth secret may not exist during upgrades from non-auth clusters
-	iriAuthSecret, _ := ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
+	iriAuthSecret, err := ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
+	if err != nil && !errors.IsNotFound(err) {
+		klog.V(4).Infof("Could not get IRI auth secret %s: %v", ctrlcommon.InternalReleaseImageAuthSecretName, err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 348 - 350, When fetching the auth secret with
ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
do not ignore all errors: check the returned error and use
apierrors.IsNotFound(err) to treat a missing secret as OK for upgrades, but for
any other error log it (via the controller logger) and return/requeue the
reconcile with the error so transient RBAC/network issues are surfaced; update
the code around iriAuthSecret acquisition in internalreleaseimage_controller.go
to handle these two cases explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 348-350: When fetching the auth secret with
ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
do not ignore all errors: check the returned error and use
apierrors.IsNotFound(err) to treat a missing secret as OK for upgrades, but for
any other error log it (via the controller logger) and return/requeue the
reconcile with the error so transient RBAC/network issues are surfaced; update
the code around iriAuthSecret acquisition in internalreleaseimage_controller.go
to handle these two cases explicitly.

In `@pkg/controller/internalreleaseimage/pullsecret_test.go`:
- Around line 53-66: Add a unit test in pullsecret_test.go that verifies
MergeIRIAuthIntoPullSecret returns an error when baseDomain is an empty string;
locate the test table in the existing tests (near the cases named "invalid JSON
returns error" and "missing auths field returns error") and add a new case with
name like "empty baseDomain returns error", pullSecret set to a valid JSON auth
object, password set to a non-empty value, baseDomain set to "", and expectError
true to assert the function MergeIRIAuthIntoPullSecret enforces the non-empty
baseDomain validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 347c1736-456a-4dc9-a7a5-da14b0b1b0b1

📥 Commits

Reviewing files that changed from the base of the PR and between 77df1fb and 1498ed4.

📒 Files selected for processing (13)
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/pullsecret_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go

@rwsu
Copy link
Author

rwsu commented Mar 18, 2026

/retest-required

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2026
rwsu added 2 commits March 23, 2026 10:04
Add htpasswd-based authentication to the IRI registry. The installer
generates credentials and provides them via a bootstrap secret. The MCO
mounts the htpasswd file into the registry container and configures
registry auth environment variables. The registry password is merged
into the node pull secret so kubelet can authenticate when pulling
the release image.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
The IRI controller merges registry auth credentials into the global
pull secret after bootstrap. This triggers the template controller to
re-render template MCs (00-master, etc.) with the updated pull secret,
producing a different rendered MC hash than what bootstrap created.

The mismatch causes the MCD DaemonSet pod to fail during bootstrap:
it reads the bootstrap-rendered MC name from the node annotation, but
that MC no longer exists in-cluster (replaced by the re-rendered one).
The MCD falls back to reading /etc/machine-config-daemon/currentconfig,
which was never written because the firstboot MCD detected "no changes"
and skipped it. Both master nodes go Degraded and never recover.

Fix by merging IRI auth into the pull secret during bootstrap before
template MC rendering, so both bootstrap and in-cluster produce
identical rendered MC hashes.

Extract the pull secret merge logic into a shared MergeIRIAuthIntoPullSecret
function used by both the bootstrap path and the in-cluster IRI controller.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
@rwsu rwsu force-pushed the AGENT-1449-auth branch from 1498ed4 to 6f3b7f7 Compare March 23, 2026 18:33
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go (2)

348-349: Silently ignoring all errors when fetching auth secret.

The error is completely discarded, which means transient failures (network issues, RBAC problems) are silently ignored and may delay auth propagation without any indication. Consider logging non-NotFound errors.

♻️ Proposed fix
 	// Auth secret may not exist during upgrades from non-auth clusters
-	iriAuthSecret, _ := ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
+	iriAuthSecret, err := ctrl.secretLister.Secrets(ctrlcommon.MCONamespace).Get(ctrlcommon.InternalReleaseImageAuthSecretName)
+	if err != nil && !errors.IsNotFound(err) {
+		klog.V(4).Infof("Failed to get auth secret %s: %v", ctrlcommon.InternalReleaseImageAuthSecretName, err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 348 - 349, When retrieving the auth secret with
ctrl.secretLister.Secrets(...).Get(...), don't ignore the returned error; check
the error and if it's non-nil and not a NotFound error (use
apierrors.IsNotFound), log or return it so transient failures are visible.
Locate the call that assigns iriAuthSecret (the Get against
ctrlcommon.MCONamespace and ctrlcommon.InternalReleaseImageAuthSecretName) and
add an error check: if err != nil && !apierrors.IsNotFound(err) then emit a
descriptive log via the controller logger (or return the error) so RBAC/network
issues are surfaced; keep treating NotFound as acceptable. Ensure you reference
iriAuthSecret and the secretLister.Secrets(...) call when making this change.

502-508: Consider adding retry logic for pull secret update.

Other updates in this controller use retry.RetryOnConflict (e.g., createOrUpdateMachineConfig, initializeInternalReleaseImageStatus). The global pull secret could be modified by other controllers concurrently, and a conflict would cause the entire sync to fail and requeue.

♻️ Proposed fix
-	pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes
-	_, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update(
-		context.TODO(), pullSecret, metav1.UpdateOptions{})
-	if err == nil {
-		klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion)
-	}
-	return err
+	return retry.RetryOnConflict(updateBackoff, func() error {
+		// Re-fetch to get latest resourceVersion on retry
+		pullSecret, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Get(
+			context.TODO(), ctrlcommon.GlobalPullSecretName, metav1.GetOptions{})
+		if err != nil {
+			return err
+		}
+		mergedBytes, err = MergeIRIAuthIntoPullSecret(pullSecret.Data[corev1.DockerConfigJsonKey], password, baseDomain)
+		if err != nil {
+			return err
+		}
+		if bytes.Equal(mergedBytes, pullSecret.Data[corev1.DockerConfigJsonKey]) {
+			return nil
+		}
+		pullSecret.Data[corev1.DockerConfigJsonKey] = mergedBytes
+		_, err = ctrl.kubeClient.CoreV1().Secrets(ctrlcommon.OpenshiftConfigNamespace).Update(
+			context.TODO(), pullSecret, metav1.UpdateOptions{})
+		if err == nil {
+			klog.Infof("Updated pull secret with IRI registry auth credentials from secret %s/%s (uid=%s, resourceVersion=%s)", authSecret.Namespace, authSecret.Name, authSecret.UID, authSecret.ResourceVersion)
+		}
+		return err
+	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`
around lines 502 - 508, Wrap the pull-secret update in a retry.RetryOnConflict
loop similar to
createOrUpdateMachineConfig/initializeInternalReleaseImageStatus: on conflict
re-fetch the latest pullSecret via
ctrl.kubeClient.CoreV1().Secrets(...).Get(...), re-apply the mergedBytes to
pullSecret.Data[corev1.DockerConfigJsonKey], and call Update again until success
or non-conflict error; preserve the existing klog.Infof on success and return
the final error. Use the same metav1.UpdateOptions and context.TODO() already
used so the logic integrates with the current Update call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 348-349: When retrieving the auth secret with
ctrl.secretLister.Secrets(...).Get(...), don't ignore the returned error; check
the error and if it's non-nil and not a NotFound error (use
apierrors.IsNotFound), log or return it so transient failures are visible.
Locate the call that assigns iriAuthSecret (the Get against
ctrlcommon.MCONamespace and ctrlcommon.InternalReleaseImageAuthSecretName) and
add an error check: if err != nil && !apierrors.IsNotFound(err) then emit a
descriptive log via the controller logger (or return the error) so RBAC/network
issues are surfaced; keep treating NotFound as acceptable. Ensure you reference
iriAuthSecret and the secretLister.Secrets(...) call when making this change.
- Around line 502-508: Wrap the pull-secret update in a retry.RetryOnConflict
loop similar to
createOrUpdateMachineConfig/initializeInternalReleaseImageStatus: on conflict
re-fetch the latest pullSecret via
ctrl.kubeClient.CoreV1().Secrets(...).Get(...), re-apply the mergedBytes to
pullSecret.Data[corev1.DockerConfigJsonKey], and call Update again until success
or non-conflict error; preserve the existing klog.Infof on success and return
the final error. Use the same metav1.UpdateOptions and context.TODO() already
used so the logic integrates with the current Update call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3249602d-1b91-4676-83e1-c7bccc52e7c7

📥 Commits

Reviewing files that changed from the base of the PR and between 1498ed4 and 6f3b7f7.

📒 Files selected for processing (13)
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/pullsecret.go
  • pkg/controller/internalreleaseimage/pullsecret_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
✅ Files skipped from review due to trivial changes (3)
  • pkg/controller/internalreleaseimage/templates/master/files/iri-registry-auth-htpasswd.yaml
  • pkg/controller/common/constants.go
  • pkg/controller/internalreleaseimage/pullsecret_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/internalreleaseimage/templates/master/units/iri-registry.service.yaml
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/bootstrap/bootstrap.go

@rwsu
Copy link
Author

rwsu commented Mar 24, 2026

/retest-required

@andfasano
Copy link
Contributor

Depends on #5765


// Merge IRI auth credentials into the global pull secret
if iriAuthSecret != nil {
if err := ctrl.mergeIRIAuthIntoPullSecret(cconfig, iriAuthSecret); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this reconciliation step may be worth its own sync method. AFAIU we should react to the following events:

  • The global pull secret is created/updated/deleted
  • The IRI registry auth secret is created/updated/deleted

Currently the controller watches only update events (and initially meant to trap just updates on TLS certs, as they were required for generating the MC).
So, from the point of view of generating IRI MC it's ok to listen also IRI registry auth updates (they are consumed previously by the Renderer).

But from the point of view of keeping the global pull secret in sync (merged) also with the IRI creds I'm feeling that this is not the best place where to handle that. Not sure if someone from the MCO team (cc @yuqi-zhang @djoshy) may have a better view on that.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`:
- Around line 84-94: In verifyPullSecret (the verifyPullSecret func in
internalreleaseimage_controller_test.go) replace the unchecked type assertion
auths := dockerConfig["auths"].(map[string]interface{}) and subsequent assert
calls with safe, checked type assertions using require for setup-critical
failures; specifically, use a type-assert-with-ok pattern for
dockerConfig["auths"] and for the iriEntry value (ensure iriEntry is a
map[string]interface{}), call require.NoError/require.True to fail fast on
parsing/type issues, then extract the "auth" (or expected credential field) from
the iriEntry map and assert its value equals the expected merged credential
string instead of only asserting presence. Ensure you reference
metav1.GetOptions, corev1.DockerConfigJsonKey, ctrlcommon.GlobalPullSecretName
and the verifyPullSecret function when making the changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e61e77c-e707-441d-8f51-a1173916ef25

📥 Commits

Reviewing files that changed from the base of the PR and between 6f3b7f7 and eac7c26.

📒 Files selected for processing (6)
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_helpers_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/controller/internalreleaseimage/internalreleaseimage_bootstrap_test.go
  • pkg/controller/internalreleaseimage/templates/master/files/usr-local-bin-load-registry-image-sh.yaml
  • pkg/controller/bootstrap/bootstrap.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go

- bootstrap.go: gate pull secret merge on iri != nil && iriAuthSecret != nil
  (restore the IRI CR check that was lost in the hash mismatch fix, drop the
  unnecessary cconfig.Spec.DNS != nil guard); treat merge failure as an error
- controller: return an error when the IRI auth secret is missing instead of
  silently ignoring it; auth is expected to always be present
- controller: remove the iriAuthSecret != nil guard around mergeIRIAuthIntoPullSecret
- mergeIRIAuthIntoPullSecret: error on empty password; remove DNS nil check
  (DNS is always set); error on pull secret not found
- load-registry-image.sh: always pass --authfile, kubelet config.json is
  always available by the time the remote pull fallback runs
- tests: assume auth is always present -- add iriAuthSecret/pullSecret/withDNS
  to all non-deletion test cases; collapse WithAuth variants into single
  functions; remove no-auth test case

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@rwsu rwsu force-pushed the AGENT-1449-auth branch from eac7c26 to c4d6d5e Compare March 26, 2026 03:13
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 26, 2026

@rwsu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl-part2 c4d6d5e link false /test e2e-gcp-op-ocl-part2
ci/prow/unit c4d6d5e link true /test unit
ci/prow/e2e-hypershift c4d6d5e link true /test e2e-hypershift

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@andfasano
Copy link
Contributor

andfasano commented Mar 26, 2026

Thank you @rwsu for the update, it looks good to me. My only remaining concern is about this comment, related the update of the global secret ( we can followup in another PR eventually)

/approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants