Skip to content

Cleanup stale metrics and event subscriptions#889

Open
asergeant01 wants to merge 1 commit into
mainfrom
feat/improve-metric-report-subscription-handling
Open

Cleanup stale metrics and event subscriptions#889
asergeant01 wants to merge 1 commit into
mainfrom
feat/improve-metric-report-subscription-handling

Conversation

@asergeant01

@asergeant01 asergeant01 commented May 13, 2026

Copy link
Copy Markdown
Contributor

Fixes #886

Summary by CodeRabbit

  • New Features

    • Controller now detects missing Redfish event and metrics subscriptions and recreates them automatically.
  • Bug Fixes

    • Stale subscription links are removed from status when the remote subscription no longer exists; only missing links trigger recreation.
  • Tests

    • Added an integration-style test that simulates external deletion and verifies subscriptions are recreated while unaffected links remain stable.

@asergeant01 asergeant01 requested a review from a team as a code owner May 13, 2026 12:46
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 41bec4e to 1892ef5 Compare May 13, 2026 12:46
@xkonni xkonni changed the title handle stale metrics and event subscritpions handle stale metrics and event subscriptions May 13, 2026
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@asergeant01, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 52 minutes and 50 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c2d67f5-833b-4104-b989-a5c716f8d16b

📥 Commits

Reviewing files that changed from the base of the PR and between 543c877 and 53f3be8.

📒 Files selected for processing (5)
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go
📝 Walkthrough

Walkthrough

Adds BMC.GetEventSubscription and a Redfish implementation that treats HTTP 404 as “not present”; controller validates stored metrics/events subscription links and clears stale status links so missing subscriptions are recreated; mock-server delete helper and an integration test simulate and verify external deletion recovery.

Changes

Event Subscription Staleness Detection

Layer / File(s) Summary
Event Subscription Query Contract
bmc/bmc.go
Adds GetEventSubscription(ctx, uri) (bool, error) to the BMC interface with documented (exists, err) semantics.
Redfish Event Subscription Query Implementation
bmc/redfish.go
RedfishBaseBMC.GetEventSubscription verifies the event service and calls the Redfish event API; Redfish 404 yields (false, nil), other errors are propagated, success returns event != nil.
BMC Controller Stale Link Validation
internal/controller/bmc_controller.go
handleEventSubscriptions pre-checks Status.MetricsReportSubscriptionLink and Status.EventsSubscriptionLink with GetEventSubscription; clears stale links via status patches so the controller can recreate missing subscriptions. Also updates an events-patch error message string.
Mock Server Subscription Deletion Helper
bmc/mock/server/server.go
MockServer.DeleteSubscription removes a subscription override and updates the parent subscription collection to simulate external deletion on the mock BMC.
Stale Subscription Link Reconciliation Test
internal/controller/bmc_controller_test.go
Integration test deletes the metrics subscription on the mock server, forces reconcile, asserts the metrics link is recreated with a new URI while the events link is unchanged, and performs cleanup.

Sequence Diagram

sequenceDiagram
  participant Controller as BMCReconciler
  participant BMCClient as RedfishBaseBMC
  participant Redfish as RedfishEventService
  participant Mock as MockServer

  Controller->>BMCClient: GetEventSubscription(ctx, metricsURI)
  BMCClient->>Redfish: ev.GetEventSubscription(metricsURI)
  Redfish-->>BMCClient: 404 (not found) / subscription object
  BMCClient-->>Controller: (false,nil) or (true,nil)
  alt subscription not found
    Controller->>Controller: clear status link (status patch)
    Controller->>BMCClient: CreateEventSubscription(...)
    BMCClient->>Redfish: CreateEventSubscription(...)
    Redfish-->>BMCClient: new subscription URI
    BMCClient-->>Controller: new URI
  else subscription exists
    Controller-->>Controller: keep status link
  end

  Note over Mock,Redfish: MockServer.DeleteSubscription removes member from subscription collection to simulate external deletion
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • ironcore-dev/metal-operator#494: Introduced subscription collection mutation logic and initial create/delete subscription handling that this PR builds on.

Suggested labels

api-change

Suggested reviewers

  • afritzler
  • stefanhipfel
  • xkonni
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal (only 'Fixes #886') and does not follow the template structure with 'Proposed Changes' and detailed bullets. Expand the description to include a summary of proposed changes, key implementation details (e.g., new GetEventSubscription method, subscription validation logic), and link to issue #886.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Cleanup stale metrics and event subscriptions' is fully related to and clearly summarizes the main change — adding logic to detect and cleanup stale subscription links.
Linked Issues check ✅ Passed All code changes implement the core requirements from issue #886: GetEventSubscription [886] validates subscription existence, handleEventSubscriptions [886] checks stale links before creating new ones, and the test [886] verifies subscriptions are recreated after external deletion.
Out of Scope Changes check ✅ Passed All changes are scoped to subscription validation and recreation logic directly related to issue #886; no unrelated modifications were introduced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/improve-metric-report-subscription-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asergeant01 asergeant01 added this to the v0.6.0 milestone May 13, 2026

@xkonni xkonni left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, fixes issue, minor optimizations possible.

Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go Outdated
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 1892ef5 to 5c3f881 Compare May 13, 2026 13:25
@asergeant01

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bmc/mock/server/server.go`:
- Around line 1313-1321: The code can silently overwrite an existing collection
with an empty one when type assertions fail or dataFS.ReadFile/json.Unmarshal
fail; update the loading logic around the variables col, cached, Collection,
dataFS.ReadFile, json.Unmarshal and collectionKey so that: (1) when hasOverride
is true but cached is not a Collection, do NOT replace col (leave original
cached value or bail); (2) when ReadFile returns an error or json.Unmarshal
fails, do not overwrite col with the empty default—propagate/handle the error or
keep the prior value; and (3) before the persistence step that writes col (the
save at the persistence call around collection handling), add a guard to only
persist when col is valid (e.g., col.ID is non-empty or len(col.Members) > 0);
if invalid, skip saving and return/log an error so you cannot accidentally wipe
members.

In `@internal/controller/bmc_controller.go`:
- Around line 567-580: Update the error string in the patch failure to reference
the BMC status rather than "server status": when handling
bmcObj.Status.MetricsReportSubscriptionLink and calling r.Status().Patch(ctx,
bmcObj, client.MergeFrom(bmcBase)), change the fmt.Errorf message used on error
return to something like "failed to patch BMC status to clear stale metrics
subscription link: %w" so it correctly references bmcObj (BMC) and its status.
- Around line 595-608: The error message used when the status patch fails
incorrectly references "server status"; update the fmt.Errorf call in the block
handling bmcObj.Status.EventsSubscriptionLink (around GetEventSubscription and
r.Status().Patch) to say "failed to patch BMC status to clear stale events
subscription link" so it correctly names the BMC status and use the existing
variables (bmcObj, bmcBase) and error wrapping as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11823367-8cf1-4fb0-bc23-9df7749fba1a

📥 Commits

Reviewing files that changed from the base of the PR and between bf8e075 and 5c3f881.

📒 Files selected for processing (5)
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go

Comment thread bmc/mock/server/server.go
Comment thread internal/controller/bmc_controller.go
Comment thread internal/controller/bmc_controller.go
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 5c3f881 to bfda8c6 Compare May 13, 2026 15:13
@asergeant01 asergeant01 self-assigned this May 14, 2026
@afritzler afritzler changed the title handle stale metrics and event subscriptions Cleanup stale metrics and event subscriptions May 19, 2026
@afritzler afritzler added the enhancement New feature or request label May 19, 2026
@afritzler

Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go Outdated
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from bfda8c6 to 543c877 Compare June 2, 2026 14:25

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bmc/redfish.go (1)

1182-1201: 💤 Low value

Optional: deduplicate the event-service preamble.

GetEventSubscription repeats the GetServiceEventServiceServiceEnabledev.GetEventSubscription(uri) sequence already present in DeleteEventSubscription (lines 1203-1218). A small helper returning (*schemas.EventDestination, error) would centralize the ServiceEnabled guard and 404 handling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bmc/redfish.go` around lines 1182 - 1201, Extract the repeated GetService →
EventService → ServiceEnabled check and the ev.GetEventSubscription(uri) call
(including the 404 handling that checks for *schemas.Error with
HTTPReturnedStatusCode == http.StatusNotFound) into a new helper (e.g.
getEventDestination or fetchEventSubscription) that returns
(*schemas.EventDestination, error); then update GetEventSubscription and
DeleteEventSubscription to call that helper and act on the returned destination
(nil vs non-nil) or error, preserving the existing error wrapping semantics and
the special-case 404 -> nil behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@bmc/redfish.go`:
- Around line 1182-1201: Extract the repeated GetService → EventService →
ServiceEnabled check and the ev.GetEventSubscription(uri) call (including the
404 handling that checks for *schemas.Error with HTTPReturnedStatusCode ==
http.StatusNotFound) into a new helper (e.g. getEventDestination or
fetchEventSubscription) that returns (*schemas.EventDestination, error); then
update GetEventSubscription and DeleteEventSubscription to call that helper and
act on the returned destination (nil vs non-nil) or error, preserving the
existing error wrapping semantics and the special-case 404 -> nil behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cb25bc9b-02a0-4779-8aed-e43e050cb151

📥 Commits

Reviewing files that changed from the base of the PR and between bfda8c6 and 543c877.

📒 Files selected for processing (5)
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • bmc/bmc.go
  • internal/controller/bmc_controller_test.go
  • internal/controller/bmc_controller.go
  • bmc/mock/server/server.go

Signed-off-by: Alan Sergeant <alan.sergeant@sap.com>
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 543c877 to 53f3be8 Compare June 2, 2026 14:32
@asergeant01 asergeant01 requested a review from afritzler June 2, 2026 14:43
@afritzler

Copy link
Copy Markdown
Member

@asergeant01 can you please rebase this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

BMC event/metrics subscriptions not recreated after external deletion

4 participants