Skip to content

ServerMaintenance: optionally turn on Locator LED#948

Open
stefanhipfel wants to merge 3 commits into
mainfrom
worktree-issue421
Open

ServerMaintenance: optionally turn on Locator LED#948
stefanhipfel wants to merge 3 commits into
mainfrom
worktree-issue421

Conversation

@stefanhipfel

@stefanhipfel stefanhipfel commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Closes #421

Summary by CodeRabbit

Release Notes

  • New Features
    • Added locator LED control for server maintenance operations. Administrators can now set the server's indicator LED during maintenance; it automatically turns off when maintenance concludes.

Closes #421

Signed-off-by: Stefan Hipfel <stefan.hipfel@sap.com>
Signed-off-by: Stefan Hipfel <stefan.hipfel@sap.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 12, 2026
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@stefanhipfel, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 38 minutes and 50 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1728d2d9-ffa9-44c1-9b51-115e145ea248

📥 Commits

Reviewing files that changed from the base of the PR and between bf2d47c and ee254d2.

⛔ Files ignored due to path filters (1)
  • dist/chart/templates/crd/metal.ironcore.dev_servermaintenances.yaml is excluded by !**/dist/**
📒 Files selected for processing (2)
  • docs/api-reference/api.md
  • internal/controller/servermaintenance_controller_test.go
📝 Walkthrough

Walkthrough

This PR adds locator LED control to server maintenance operations. It introduces a new LocatorLED field to ServerMaintenanceSpec, implements BMC-backed LED state management through the server reconciler, and wires the maintenance reconciler to turn the LED on during maintenance and off upon completion.

Changes

Locator LED Feature

Layer / File(s) Summary
API schema and configuration
api/v1alpha1/servermaintenance_types.go, api/v1alpha1/applyconfiguration/api/v1alpha1/servermaintenancespec.go, api/v1alpha1/applyconfiguration/internal/internal.go, config/crd/bases/metal.ironcore.dev_servermaintenances.yaml
ServerMaintenanceSpec gains optional LocatorLED field. Generated apply-configuration struct and CRD schema updated to include the new field with omitempty serialization.
BMC interface and Redfish implementation
bmc/bmc.go, bmc/redfish.go
BMC interface adds SetIndicatorLED method. RedfishBaseBMC implements it by fetching the target system, setting IndicatorLED, and persisting via system.Update().
Server controller LED state synchronization
internal/controller/server_controller.go
ServerReconciler.ensureIndicatorLED compares desired vs. current LED state and calls bmcClient.SetIndicatorLED when needed. Both Available and Reserved state paths invoke the helper with the active BMC client.
ServerMaintenance LED control and lifecycle
internal/controller/servermaintenance_controller.go
ServerMaintenanceReconciler sets server LED to LitIndicatorLED when entering maintenance via setAndPatchServerState helper. Cleanup method clears LED to OffIndicatorLED on deletion when LocatorLED was configured.
Integration tests
internal/controller/servermaintenance_controller_test.go
Test verifies ServerMaintenance with LocatorLED sets Server.Spec.IndicatorLED to LitIndicatorLED during maintenance and clears it to OffIndicatorLED after deletion.

Sequence Diagram

sequenceDiagram
  participant User
  participant ServerMaintenance Reconciler
  participant Server Reconciler
  participant BMC Client
  participant Redfish System

  User->>ServerMaintenance Reconciler: Create ServerMaintenance with LocatorLED
  ServerMaintenance Reconciler->>Server Reconciler: Patch Server.Spec.IndicatorLED to LitIndicatorLED
  Server Reconciler->>Server Reconciler: ensureIndicatorLED compares desired vs current
  Server Reconciler->>BMC Client: SetIndicatorLED(systemURI, LitIndicatorLED)
  BMC Client->>Redfish System: Update system.IndicatorLED
  Redfish System-->>BMC Client: System updated
  BMC Client-->>Server Reconciler: Success

  User->>ServerMaintenance Reconciler: Delete ServerMaintenance
  ServerMaintenance Reconciler->>Server Reconciler: Patch Server.Spec.IndicatorLED to OffIndicatorLED
  Server Reconciler->>BMC Client: SetIndicatorLED(systemURI, OffIndicatorLED)
  BMC Client->>Redfish System: Update system.IndicatorLED
  Redfish System-->>BMC Client: System updated
  BMC Client-->>Server Reconciler: Success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal (only 'Closes #421') and does not follow the template structure requiring 'Proposed Changes' section with bullet points and issue reference. Expand description to include a 'Proposed Changes' section outlining the key changes (API field, BMC interface method, controller logic, etc.).
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ServerMaintenance: optionally turn on Locator LED' clearly and concisely summarizes the main change—adding optional locator LED control to ServerMaintenance resources.
Linked Issues check ✅ Passed The PR implements all requirements from issue #421: added locatorLED field to ServerMaintenance spec, implemented LED control via BMC during maintenance, and clears LED when maintenance ends.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing locator LED control for ServerMaintenance as specified in issue #421, with no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-issue421

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/controller/servermaintenance_controller.go (1)

38-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add RBAC markers for Server resource.

The ServerMaintenanceReconciler patches and updates Server objects (lines 254, 275, 311, 337, 385, etc.), but RBAC markers for the metal.ironcore.dev servers resource are missing. Without these permissions, the controller ServiceAccount won't be authorized to modify servers. As per coding guidelines, include RBAC markers specifying all required permissions.

🔐 Add missing RBAC markers
 // +kubebuilder:rbac:groups=metal.ironcore.dev,resources=servermaintenances,verbs=get;list;watch;create;update;patch;delete
 // +kubebuilder:rbac:groups=metal.ironcore.dev,resources=servermaintenances/status,verbs=get;update;patch
 // +kubebuilder:rbac:groups=metal.ironcore.dev,resources=servermaintenances/finalizers,verbs=update
+// +kubebuilder:rbac:groups=metal.ironcore.dev,resources=servers,verbs=get;list;watch;update;patch
+// +kubebuilder:rbac:groups=metal.ironcore.dev,resources=serverclaims,verbs=get;list;patch
+// +kubebuilder:rbac:groups=metal.ironcore.dev,resources=serverbootconfigurations,verbs=get;create;delete;patch

Note: Also add markers for serverclaims and serverbootconfigurations which this controller modifies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/servermaintenance_controller.go` around lines 38 - 40,
Add RBAC markers granting this controller permissions to manage the Server,
ServerClaim, and ServerBootConfiguration resources: update the top of
servermaintenance_controller.go to include +kubebuilder:rbac lines for
group=metal.ironcore.dev resources=servers,servers/status,servers/finalizers and
likewise for serverclaims and serverbootconfigurations, specifying the verbs the
reconciler uses (e.g., get;list;watch;create;update;patch;delete for resources,
and get;update;patch for /status, update for /finalizers) so
ServerMaintenanceReconciler can patch/update those objects at runtime.

Source: Coding guidelines

internal/controller/server_controller.go (1)

583-612: ⚠️ Potential issue | 🟠 Major

Call ensureIndicatorLED during ServerStateMaintenance to keep BMC locator LED in sync

internal/controller/server_controller.go only calls r.ensureIndicatorLED(...) from handleAvailableState and handleReservedState; handleMaintenanceState does not, so server.Spec.IndicatorLED can change without the BMC LED being reasserted during Maintenance. Add an ensureIndicatorLED call in handleMaintenanceState (keep it alongside the other per-reconcile BMC operations).

🔧 Proposed fix
 func (r *ServerReconciler) handleMaintenanceState(ctx context.Context, bmcClient bmc.BMC, server *metalv1alpha1.Server) (bool, error) {
 	log := ctrl.LoggerFrom(ctx)
 	if server.Spec.ServerMaintenanceRef == nil {
 		log.V(1).Info("Server is in Maintenance state, but no ServerMaintenanceRef is set, transitioning back to previous state")
 		// update system info in case the server was changed during Maintenance state (hardwere changes, biosVersion etc.)
 		if err := r.updateServerStatusFromSystemInfo(ctx, bmcClient, server); err != nil {
 			return false, fmt.Errorf("failed to update server status system info: %w", err)
 		}
 		if err := bmcClient.ClearBootOverride(ctx, server.Spec.SystemURI); err != nil {
 			return false, fmt.Errorf("failed to clear boot override on maintenance exit: %w", err)
 		}
 		if server.Spec.ServerClaimRef == nil {
 			return r.patchServerState(ctx, server, metalv1alpha1.ServerStateInitial)
 		}
 		return r.patchServerState(ctx, server, metalv1alpha1.ServerStateReserved)
 	}
 	// Re-assert the persistent network-boot override on every reconcile while in Maintenance.
 	// This protects against reboots not driven by metal-operator (e.g. a vendor BIOS
 	// upgrade task rebooting the system itself) falling through to disk and starting
 	// the production OS while the host is still being worked on.
 	if err := bmcClient.SetBootOverride(ctx, server.Spec.SystemURI, true); err != nil {
 		return false, fmt.Errorf("failed to set persistent network boot for maintenance: %w", err)
 	}
 	if err := r.ensureServerPowerState(ctx, bmcClient, server); err != nil {
 		return false, fmt.Errorf("failed to ensure server power state: %w", err)
 	}
+	
+	if err := r.ensureIndicatorLED(ctx, bmcClient, server); err != nil {
+		return false, fmt.Errorf("failed to ensure server indicator led: %w", err)
+	}
 
 	log.V(1).Info("Reconciled maintenance state")
 	return false, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/server_controller.go` around lines 583 - 612,
handleMaintenanceState is missing a call to r.ensureIndicatorLED, so changes to
server.Spec.IndicatorLED are not applied while in maintenance; add a call to
r.ensureIndicatorLED(ctx, bmcClient, server) in handleMaintenanceState alongside
the existing per-reconcile BMC ops (e.g., after SetBootOverride and before/after
ensureServerPowerState), and handle its returned error like the others (wrap and
return an error on failure) so the BMC locator LED is kept in sync during
ServerStateMaintenance.
🧹 Nitpick comments (4)
api/v1alpha1/servermaintenance_types.go (1)

48-51: ⚡ Quick win

Clarify the LED cleanup behavior in the documentation.

The comment "When maintenance ends, the locator LED is turned off" could be interpreted as "the server's LED is always turned off" or "the LED set by this field is turned off." The implementation only clears the LED when LocatorLED is set (not empty). Consider rephrasing for clarity.

📝 Suggested documentation improvement
 // LocatorLED specifies the desired state of the server's locator LED during maintenance.
-// When maintenance ends, the locator LED is turned off.
+// If set, the locator LED is cleared (turned off) when maintenance ends.
 // +optional
 LocatorLED IndicatorLED `json:"locatorLED,omitempty"`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/v1alpha1/servermaintenance_types.go` around lines 48 - 51, Update the
comment for LocatorLED (type IndicatorLED) to clarify that only the LED state
explicitly set by the LocatorLED field is cleared at the end of maintenance;
specifically, rephrase "When maintenance ends, the locator LED is turned off" to
something like "When maintenance ends, the LED state specified by LocatorLED is
cleared/turned off — if LocatorLED is unset/empty no change is made." This
change should be made next to the LocatorLED field declaration to accurately
reflect the behavior implemented for LocatorLED.
internal/controller/servermaintenance_controller.go (1)

254-258: ⚡ Quick win

Update log message to reflect LED state changes.

The method setAndPatchServerState now patches both power and (optionally) LED state, but the log message at line 257 only mentions power state. Update the message to reflect that LED might also be set.

💬 Suggested log message improvement
 		if err := r.setAndPatchServerState(ctx, server, maintenance); err != nil {
 			return ctrl.Result{}, err
 		}
-		log.V(1).Info("Patched server power state", "Server", server.Name, "Power", maintenance.Spec.ServerPower)
+		log.V(1).Info("Patched server state", "Server", server.Name, "Power", maintenance.Spec.ServerPower, "IndicatorLED", maintenance.Spec.LocatorLED)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/servermaintenance_controller.go` around lines 254 - 258,
The log message after calling setAndPatchServerState should mention both power
and LED changes; update the log.V(1).Info call (the one that currently logs
"Patched server power state") to include the LED state as well by referencing
maintenance.Spec.ServerPower and the LED field (e.g., maintenance.Spec.ServerLED
or the exact LED field name used in the Maintenance type) along with server.Name
so the log reflects that setAndPatchServerState may have patched power and/or
LED state.
internal/controller/server_controller.go (1)

1167-1177: ⚡ Quick win

Consider adding structured logging for LED synchronization.

Similar to the BMC layer, logging LED state changes at the controller level would aid troubleshooting when onsite technicians report LED issues. This is particularly valuable for ServerMaintenance scenarios where the LED is used to locate servers requiring physical intervention.

📝 Suggested logging addition
 func (r *ServerReconciler) ensureIndicatorLED(ctx context.Context, bmcClient bmc.BMC, server *metalv1alpha1.Server) error {
+	log := ctrl.LoggerFrom(ctx)
 	if server.Spec.IndicatorLED == "" {
 		return nil
 	}
 	desired := schemas.IndicatorLED(server.Spec.IndicatorLED)   //nolint:staticcheck
 	current := schemas.IndicatorLED(server.Status.IndicatorLED) //nolint:staticcheck
 	if desired == current {
 		return nil
 	}
+	log.V(1).Info("Synchronizing indicator LED", "desired", desired, "current", current)
 	return bmcClient.SetIndicatorLED(ctx, server.Spec.SystemURI, desired)
 }

As per coding guidelines, use structured logging with key-value pairs following Kubernetes conventions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/server_controller.go` around lines 1167 - 1177, Add
structured logging in ensureIndicatorLED: log when IndicatorLED is empty
(debug), when desired != current (Info) with key-values server name, namespace,
systemURI, desired and current, and after calling bmcClient.SetIndicatorLED log
success or failure; on error include the returned error in the log. Use the
reconciler logger (r.Log.WithValues(...).Info/Error(...)) and reference the
function ensureIndicatorLED, variables server, desired, current, and the call to
bmcClient.SetIndicatorLED to locate where to insert these logs.

Source: Coding guidelines

bmc/redfish.go (1)

205-213: ⚡ Quick win

Consider adding structured logging for LED state changes.

The locator LED is specifically meant to help onsite technicians locate physical hardware. Logging when the LED state is changed would help troubleshoot cases where technicians report the LED is not lit as expected.

📝 Suggested logging addition
 func (r *RedfishBaseBMC) SetIndicatorLED(ctx context.Context, systemURI string, state schemas.IndicatorLED) error { //nolint:staticcheck
+	log := ctrl.LoggerFrom(ctx)
 	system, err := r.getSystemFromUri(ctx, systemURI)
 	if err != nil {
 		return fmt.Errorf("failed to get system: %w", err)
 	}
+	log.V(1).Info("Setting indicator LED", "SystemURI", systemURI, "State", state)
 	system.IndicatorLED = state //nolint:staticcheck
 	return system.Update()
 }

As per coding guidelines, follow Kubernetes logging conventions: use structured logging with key-value pairs, capitalize the message, and use active voice.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bmc/redfish.go` around lines 205 - 213, Add structured, Kubernetes-style
logging to SetIndicatorLED: after retrieving the system via getSystemFromUri and
before calling system.Update(), log the LED change with a capitalized,
active-voice message and key-value pairs (e.g., "Changing Indicator LED", system
ID/URI, previous state, desired state). Use the existing receiver logger on
RedfishBaseBMC (e.g., r.log or the project's logger instance) and ensure the log
is emitted on both success path and on error returns (include the error when
system.Update() fails) so callers can correlate failures to the attempted state
change.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/controller/server_controller.go`:
- Around line 583-612: handleMaintenanceState is missing a call to
r.ensureIndicatorLED, so changes to server.Spec.IndicatorLED are not applied
while in maintenance; add a call to r.ensureIndicatorLED(ctx, bmcClient, server)
in handleMaintenanceState alongside the existing per-reconcile BMC ops (e.g.,
after SetBootOverride and before/after ensureServerPowerState), and handle its
returned error like the others (wrap and return an error on failure) so the BMC
locator LED is kept in sync during ServerStateMaintenance.

In `@internal/controller/servermaintenance_controller.go`:
- Around line 38-40: Add RBAC markers granting this controller permissions to
manage the Server, ServerClaim, and ServerBootConfiguration resources: update
the top of servermaintenance_controller.go to include +kubebuilder:rbac lines
for group=metal.ironcore.dev resources=servers,servers/status,servers/finalizers
and likewise for serverclaims and serverbootconfigurations, specifying the verbs
the reconciler uses (e.g., get;list;watch;create;update;patch;delete for
resources, and get;update;patch for /status, update for /finalizers) so
ServerMaintenanceReconciler can patch/update those objects at runtime.

---

Nitpick comments:
In `@api/v1alpha1/servermaintenance_types.go`:
- Around line 48-51: Update the comment for LocatorLED (type IndicatorLED) to
clarify that only the LED state explicitly set by the LocatorLED field is
cleared at the end of maintenance; specifically, rephrase "When maintenance
ends, the locator LED is turned off" to something like "When maintenance ends,
the LED state specified by LocatorLED is cleared/turned off — if LocatorLED is
unset/empty no change is made." This change should be made next to the
LocatorLED field declaration to accurately reflect the behavior implemented for
LocatorLED.

In `@bmc/redfish.go`:
- Around line 205-213: Add structured, Kubernetes-style logging to
SetIndicatorLED: after retrieving the system via getSystemFromUri and before
calling system.Update(), log the LED change with a capitalized, active-voice
message and key-value pairs (e.g., "Changing Indicator LED", system ID/URI,
previous state, desired state). Use the existing receiver logger on
RedfishBaseBMC (e.g., r.log or the project's logger instance) and ensure the log
is emitted on both success path and on error returns (include the error when
system.Update() fails) so callers can correlate failures to the attempted state
change.

In `@internal/controller/server_controller.go`:
- Around line 1167-1177: Add structured logging in ensureIndicatorLED: log when
IndicatorLED is empty (debug), when desired != current (Info) with key-values
server name, namespace, systemURI, desired and current, and after calling
bmcClient.SetIndicatorLED log success or failure; on error include the returned
error in the log. Use the reconciler logger
(r.Log.WithValues(...).Info/Error(...)) and reference the function
ensureIndicatorLED, variables server, desired, current, and the call to
bmcClient.SetIndicatorLED to locate where to insert these logs.

In `@internal/controller/servermaintenance_controller.go`:
- Around line 254-258: The log message after calling setAndPatchServerState
should mention both power and LED changes; update the log.V(1).Info call (the
one that currently logs "Patched server power state") to include the LED state
as well by referencing maintenance.Spec.ServerPower and the LED field (e.g.,
maintenance.Spec.ServerLED or the exact LED field name used in the Maintenance
type) along with server.Name so the log reflects that setAndPatchServerState may
have patched power and/or LED state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e078b18a-4b99-4144-afac-1cc0cd1c8a09

📥 Commits

Reviewing files that changed from the base of the PR and between 1636209 and bf2d47c.

📒 Files selected for processing (9)
  • api/v1alpha1/applyconfiguration/api/v1alpha1/servermaintenancespec.go
  • api/v1alpha1/applyconfiguration/internal/internal.go
  • api/v1alpha1/servermaintenance_types.go
  • bmc/bmc.go
  • bmc/redfish.go
  • config/crd/bases/metal.ironcore.dev_servermaintenances.yaml
  • internal/controller/server_controller.go
  • internal/controller/servermaintenance_controller.go
  • internal/controller/servermaintenance_controller_test.go

Signed-off-by: Stefan Hipfel <stefan.hipfel@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change documentation Improvements or additions to documentation size/L

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

ServerMaintenance: optionally turn on Locator LED

1 participant