✨ Add BareMetalSwitch CRD and controller for switch config generation#3046
✨ Add BareMetalSwitch CRD and controller for switch config generation#3046alegacy wants to merge 1 commit intometal3-io:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @alegacy. Thanks for your PR. I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
I am starting to wonder if all of this networking should be moved to a dedicated controller (like we have for IPAM) instead of gluing it into BMO. |
|
/ok-to-test |
f2993a8 to
b04e473
Compare
| @@ -1 +1,5 @@ | |||
| IRONIC_ENDPOINT=https://192.168.222.2:6385/v1/ | |||
| IRONIC_NETWORKING_ENABLED=true | |||
There was a problem hiding this comment.
Follow-up request: BMO is capable of reading an IrSO Ironic object to populate its configuration. Ideally, we should support the switch configuration in this way as well (once the IrSO change lands, of course).
ef6012c to
51543f7
Compare
51543f7 to
d878bf1
Compare
|
/retest |
There was a problem hiding this comment.
Pull request overview
Adds a new BareMetalSwitch API and controller to generate Ironic Networking switch configuration + SSH key material from BareMetalSwitch resources and referenced credential Secrets, and wires it into manifests and E2E coverage.
Changes:
- Introduces
BareMetalSwitchCRD/API types (with status conditions) and registers it in rendered/base manifests and RBAC. - Adds a
BareMetalSwitchReconcilerplus config/secret generation helpers and unit tests. - Adds E2E test suite + overlay/config updates to enable the controller and run the new scenario.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
apis/metal3.io/v1alpha1/baremetalswitch_types.go |
New API type for BareMetalSwitch, credentials ref, and status conditions. |
apis/metal3.io/v1alpha1/zz_generated.deepcopy.go |
Generated deepcopy implementations for new API types. |
internal/controller/metal3.io/baremetalswitch_controller.go |
New controller reconcile loop and manager registration. |
internal/controller/metal3.io/baremetalswitch_config.go |
Switch INI config + credentials-secret generation and secret update helpers. |
internal/controller/metal3.io/baremetalswitch_config_test.go |
Unit tests for config generation and controller condition behavior. |
main.go |
Conditionally registers the BareMetalSwitch controller based on env vars. |
config/base/crds/kustomization.yaml |
Adds the new CRD into the base CRD kustomization. |
config/base/crds/bases/metal3.io_baremetalswitches.yaml |
New CRD manifest for BareMetalSwitch. |
config/base/rbac/role.yaml |
Grants manager permissions for baremetalswitches and status. |
config/render/capm3.yaml |
Rendered CRD + RBAC updates for BareMetalSwitch in CAPM3 render output. |
config/samples/baremetalswitch_sample.yaml |
Sample BareMetalSwitch + credentials Secret manifest. |
docs/api.md |
Adds API documentation section for BareMetalSwitch. |
config/overlays/fixture/kustomization.yaml |
Enables Ironic networking env vars in fixture overlay. |
config/overlays/e2e/kustomization.yaml |
Adds the baremetalswitch overlay to the E2E stack. |
config/overlays/e2e/baremetalswitch/kustomization.yaml |
New E2E overlay to create namespace + reuse roles/rolebindings. |
config/overlays/e2e/baremetalswitch/namespace.yaml |
Namespace manifest for baremetalswitch E2E tests. |
config/overlays/e2e/ironic.env |
Enables networking + switch secret env vars for E2E runs. |
config/overlays/e2e/namespaced-manager-patch.yaml |
Adds baremetalswitch to WATCH_NAMESPACE list for namespaced E2E runs. |
config/overlays/e2e/roles-rolebindings/roles-rolebindings.yaml |
Grants E2E roles access to baremetalswitches and status. |
test/e2e/baremetalswitch_test.go |
New E2E test validating config generation, credential types, and rotation. |
test/e2e/config/fixture.yaml |
Adds interval config for baremetalswitch secret update waits. |
test/e2e/config/ironic.yaml |
Adds interval config for baremetalswitch secret update waits. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d878bf1 to
de8b4cd
Compare
tuminoid
left a comment
There was a problem hiding this comment.
Generally LGTM, asking some clarifying questions.
| } | ||
|
|
||
| // Update the switch credentials secret | ||
| if err := updateSecretData(ctx, c, sm, namespace, credentialSecretName, result.keyFiles); err != nil { |
There was a problem hiding this comment.
keyFiles only contains keys that were successfully reconciled this cycle. So if there is some (transient) error in switch A, then those keys will be dropped until next reconcile and access is lost meanwhile. That intentional?
There was a problem hiding this comment.
Yes, that was my intent. I figured it would be better to have some data written to the config secret rather than no data to the config secret (i.e., 1 bad switch config shouldn't block all of the good ones). I can change that if there is consensus, but I do think this is the best approach.
| // User error: do not return an error to avoid exponential retries. | ||
| // Requeue after a delay so that if the credential secret is created | ||
| // later we will eventually pick it up (the Owns() watch cannot detect | ||
| // creation of a secret that has no owner reference yet and no label). | ||
| logger.Info("BareMetalSwitch has credential error, requeueing", "error", credErr) | ||
| return ctrl.Result{RequeueAfter: credentialErrorRequeueDelay}, nil | ||
| } |
There was a problem hiding this comment.
It has a comment, but 10s requeue means for user misconfugration we run reconcile 8640 times a day (360 times an hour). Do we REALLY need 10s requeues?
There was a problem hiding this comment.
@tuminoid I modeled this after the BMH handling of BMC secrets. It also requeues every 10 secs (hostErrorRetryDelay) when the BMC secret cannot be found. The difference being that I treated all errors the same... I'll change this so that a missing secret is requeued while some sort of user error on processing the secret is not.
Given that the BMH requeue timer is also 10 seconds should I leave it as-is or do you want me to back it off? I was aiming for consistency, but do agree that 10 seconds is too fast. 60secs? 5mins?
Introduces the BareMetalSwitch (BMS) CRD for defining Top-of-Rack switches managed by the Ironic Networking Service. The controller watches BMS resources and credential secrets, then generates two output secrets: 1. Switch configs secret (IRONIC_SWITCH_CONFIGS_SECRET): a single INI-format file with per-switch sections including address, credentials, driver/device type, and optional fields. 2. Switch credentials secret (IRONIC_SWITCH_CREDENTIALS_SECRET): SSH private key files (keyed by <mac-address>.key) for switches using publickey authentication. Password-authenticated switches store credentials inline in the INI config instead. The controller is only registered when IRONIC_NETWORKING_ENABLED=true and requires IRONIC_SWITCH_CONFIGS_SECRET, IRONIC_SWITCH_CREDENTIALS_SECRET, and IRONIC_SWITCH_CREDENTIALS_PATH to be set at startup. Credential secret changes (e.g. password rotation, key replacement) trigger config regeneration via a secret watch with a mapper that filters to only secrets referenced by BMS resources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allain Legacy <alegacy@redhat.com>
de8b4cd to
dc1c7cc
Compare
| @@ -1 +1,5 @@ | |||
| IRONIC_ENDPOINT=https://192.168.222.2:6385/v1/ | |||
| IRONIC_NETWORKING_ENABLED=true | |||
There was a problem hiding this comment.
@dtantsur @tuminoid basic question about the e2e tests... i started adding other tests for my other PR that will come in the future. In doing so I've been wondering if the way I did it in this PR is the right approach.
I've basically set IRONIC_NETWORKING_ENABLED=true always so the e2e tests always run this way regardless of whether the test is networking related or not. Should I split this out into its own overlay and not disturb the existing e2e tests with this variable addition?
On the other hand, it would be good for the networking code to be exercised in some of those other scenarios too.
Not sure.
There was a problem hiding this comment.
Sorry, gave this some more thought today... I guess there wouldn't be much different in the code execution if there were no BareMetalSwitch CRs, and in the future no HostNetworkAttachment CRs or references to them in BMH.spec.networkInterfaces.
...so maybe I'm overthinking this a bit and always having IRONIC_NETWORKING_ENABLED=true isn't problematic.
Introduces the BareMetalSwitch (BMS) CRD for defining Top-of-Rack switches managed by the Ironic Networking Service. The controller watches BMS resources and credential secrets, then generates two output secrets:
Switch configs secret (IRONIC_SWITCH_CONFIGS_SECRET): a single INI-format file with per-switch sections including address, credentials, driver/device type, and optional fields.
Switch credentials secret (IRONIC_SWITCH_CREDENTIALS_SECRET): SSH private key files (keyed by .key) for switches using publickey authentication. Password-authenticated switches store credentials inline in the INI config instead.
The controller is only registered when IRONIC_NETWORKING_ENABLED=true and requires IRONIC_SWITCH_CONFIGS_SECRET, IRONIC_SWITCH_CREDENTIALS_SECRET, and IRONIC_SWITCH_CREDENTIALS_PATH to be set at startup.
Credential secret changes (e.g. password rotation, key replacement) trigger config regeneration via a secret watch with a mapper that filters to only secrets referenced by BMS resources.
What this PR does / why we need it:
Partially implements: metal3-io/metal3-docs#586
Checklist:
note: api docs updated here, but metal3-docs repo will be updated once feature is complete