Add Prometheus metrics and alerts for server monitoring#718
Conversation
WalkthroughThis PR introduces Prometheus observability for the metal-operator by adding custom metrics collection for server state and reconciliation events. It includes a new ServerStateCollector that aggregates server status metrics, integrates reconciliation tracking into the controller loop, adds alert rules, promotes Prometheus dependencies to direct requirements, and provides comprehensive documentation. Changes
Sequence DiagramsequenceDiagram
participant Prometheus
participant Collector as ServerStateCollector
participant Registry as Metrics Registry
participant API as Kubernetes API
Prometheus->>Registry: Scrape metrics endpoint
Registry->>Collector: Collect() invoked
Collector->>API: List all Server objects (5s timeout)
API-->>Collector: Server list
Collector->>Collector: Aggregate by State, PowerState, Conditions
Collector-->>Registry: Emit gauge metrics<br/>(metal_server_state,<br/>metal_server_power_state,<br/>metal_server_condition_status)
Registry-->>Prometheus: Return metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
internal/metrics/server_metrics_test.go (1)
93-131: Collector tests should assert metric labels and values, not just metric presence.
metricCount > 0allows incorrect aggregations to pass unnoticed. Please assert exact expected values for key label sets (e.g.,state=Available,state=Reserved,power_state=On/Off, and condition status counts).Also applies to: 133-180, 182-220
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/metrics/server_metrics_test.go` around lines 93 - 131, The test currently only checks that collector.Collect(ch) emitted some metrics; update the It block so it decodes the emitted prometheus.Metric objects and asserts exact label/value pairs for the server-state and power_state metrics: after calling collector.Collect(ch) iterate the channel, convert each prometheus.Metric to a dto.Metric (or use prometheus/testutil to gather metrics), and assert the counts for labels state="Available" == 2 and state="Reserved" == 1, and similarly assert expected power_state and condition label/value combinations; reference the existing variables collector, ch, servers and fakeClient when locating where to add these assertions so the test verifies specific label sets/values rather than only metricCount > 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/prometheus/server_alerts.yaml`:
- Around line 67-73: The ServerReconciliationFailureSpike alert computes a
per-error-type ratio because both numerator and denominator keep the result
label; fix it by aggregating away the result label so you get an overall failure
ratio: use sum(rate(metal_server_reconciliation_total{result=~"error_.*"}[5m]))
divided by sum(rate(metal_server_reconciliation_total[5m])) (optionally with a
by(<labels>) grouping if you want per-instance/job scope) — update the alert
expression for alert: ServerReconciliationFailureSpike and ensure
metal_server_reconciliation_total's result label is not preserved in the
division.
- Around line 13-15: Replace the fragile equality check that yields empty
vectors by treating a missing metric as zero or explicitly alerting on absence:
update the NoAvailableServers and LowAvailableServerCapacity alerts (alert names
NoAvailableServers and LowAvailableServerCapacity) to use either
sum(metal_server_state{state="Available"} or on() vector(0)) < 1 (or similar
threshold) so absent series are treated as 0, or use
absent_over_time(metal_server_state{state="Available"}[5m]) if you want to fire
when the metric is missing; change the expr fields accordingly for both alerts.
In `@docs/metrics.md`:
- Around line 51-56: The fenced code blocks showing Prometheus-style metrics
(e.g., metal_server_state, metal_server_power_state,
metal_server_condition_status, metal_server_reconciliation_total) are missing
language tags and trigger MD040; update each block by adding a language
identifier (use "text") to the opening backticks for the blocks containing those
metric examples so they read ```text ... ``` to satisfy the linter at the blocks
around the examples for metal_server_state, metal_server_power_state,
metal_server_condition_status, and metal_server_reconciliation_total.
In `@internal/metrics/server_metrics.go`:
- Around line 81-86: In Collect(), replace the unbounded context.Background()
used for the Kubernetes API call with a cancellable context that has a short
timeout (e.g., 3–10s) to avoid blocking Prometheus scrapes; create the context
via context.WithTimeout(context.Background(), timeout), defer the cancel(), and
pass that ctx into c.Client.List(...) (update imports to include time if needed)
so List is always bounded and will return promptly on API stalls.
- Around line 139-157: The loop emitting condition metrics is using `for status
:= range statusMap` which iterates keys only (dropping counts) and compares
`strings.ToLower(status)` against
`string(metav1.ConditionTrue)`/`ConditionFalse` (capitalized), causing the
switch to never match and value to remain -1; fix by iterating both key and
value (e.g., `for status, count := range statusMap`) and use a case-insensitive
comparison or normalize both sides consistently (either compare
`strings.ToLower(status)` to
`strings.ToLower(string(metav1.ConditionTrue))`/`strings.ToLower(string(metav1.ConditionFalse))`
or compare `status` directly), then emit `count` (as the metric value) via
`prometheus.MustNewConstMetric` with `c.conditionDesc` so the actual server
counts and correct numeric condition values (1/0) are reported instead of -1.
---
Nitpick comments:
In `@internal/metrics/server_metrics_test.go`:
- Around line 93-131: The test currently only checks that collector.Collect(ch)
emitted some metrics; update the It block so it decodes the emitted
prometheus.Metric objects and asserts exact label/value pairs for the
server-state and power_state metrics: after calling collector.Collect(ch)
iterate the channel, convert each prometheus.Metric to a dto.Metric (or use
prometheus/testutil to gather metrics), and assert the counts for labels
state="Available" == 2 and state="Reserved" == 1, and similarly assert expected
power_state and condition label/value combinations; reference the existing
variables collector, ch, servers and fakeClient when locating where to add these
assertions so the test verifies specific label sets/values rather than only
metricCount > 0.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
README.mdcmd/main.goconfig/prometheus/server_alerts.yamldocs/metrics.mdgo.modinternal/controller/server_controller.gointernal/metrics/server_metrics.gointernal/metrics/server_metrics_test.go
- Update alert expressions for NoAvailableServers and LowAvailableServerCapacity to use sum for better accuracy. - Add new alert for missing server metrics. - Improve documentation for server condition status and example values. - Implement a timeout for metrics collection to prevent hanging scrapes.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/prometheus/server_alerts.yaml`:
- Around line 67-69: The ServerMetricsMissing alert currently uses the
expression absent(metal_server_state{state="Available"}) which triggers when no
"Available" time series exist (including valid zero-available states); update
the alert's expr to absent(metal_server_state) so it only fires when the entire
metal_server_state metric is missing; modify the ServerMetricsMissing alert
definition (alert: ServerMetricsMissing) to replace the expr accordingly and
keep the existing for: 5m and other labels/annotations intact.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
config/prometheus/server_alerts.yamldocs/metrics.mdinternal/metrics/server_metrics.go
| ## Alerting Rules | ||
|
|
||
| Example PrometheusRule resource: | ||
|
|
||
| ```yaml | ||
| apiVersion: monitoring.coreos.com/v1 | ||
| kind: PrometheusRule | ||
| metadata: | ||
| name: metal-operator-server-alerts | ||
| namespace: metal-operator-system | ||
| spec: | ||
| groups: | ||
| - name: metal_operator_servers | ||
| interval: 30s | ||
| rules: | ||
| - alert: NoAvailableServers | ||
| expr: metal_server_state{state="Available"} == 0 | ||
| for: 5m | ||
| annotations: | ||
| summary: "No available servers in the fleet" | ||
| description: "All servers are either Reserved, in Maintenance, or in Error state" | ||
| labels: | ||
| severity: warning | ||
|
|
||
| - alert: ServersInErrorState | ||
| expr: metal_server_state{state="Error"} > 0 | ||
| for: 2m | ||
| annotations: | ||
| summary: "Servers are in Error state" | ||
| description: "{{ $value }} server(s) are in Error state and require attention" | ||
| labels: | ||
| severity: critical | ||
|
|
||
| - alert: ServersPoweringOnTooLong | ||
| expr: metal_server_power_state{power_state="PoweringOn"} > 0 | ||
| for: 10m | ||
| annotations: | ||
| summary: "Servers stuck in PoweringOn state" | ||
| description: "{{ $value }} server(s) have been in PoweringOn state for over 10 minutes" | ||
| labels: | ||
| severity: warning | ||
|
|
||
| - alert: HighReconciliationErrorRate | ||
| expr: rate(metal_server_reconciliation_total{result=~"error_.*"}[5m]) > 0.1 | ||
| for: 5m | ||
| annotations: | ||
| summary: "High server reconciliation error rate" | ||
| description: "Server reconciliation errors are occurring at {{ $value | humanize }} per second" | ||
| labels: | ||
| severity: warning | ||
|
|
||
| - alert: LowAvailableServerCapacity | ||
| expr: metal_server_state{state="Available"} < 2 | ||
| for: 5m | ||
| annotations: | ||
| summary: "Low available server capacity" | ||
| description: "Only {{ $value }} server(s) are available" | ||
| labels: | ||
| severity: warning | ||
| ``` |
There was a problem hiding this comment.
Alerting example is stale versus the shipped rules.
The example still shows the old NoAvailableServers/LowAvailableServerCapacity expressions and omits newer alerts, so copy/paste users will get outdated behavior.
🛠️ Suggested doc sync
- - alert: NoAvailableServers
- expr: metal_server_state{state="Available"} == 0
+ - alert: NoAvailableServers
+ expr: sum(metal_server_state{state="Available"} or on() vector(0)) < 1
@@
- - alert: LowAvailableServerCapacity
- expr: metal_server_state{state="Available"} < 2
+ - alert: LowAvailableServerCapacity
+ expr: sum(metal_server_state{state="Available"} or on() vector(0)) < 2| }) | ||
|
|
||
| It("should count servers by state", func() { | ||
| ctx := context.Background() |
There was a problem hiding this comment.
Please use SpecContext in the func().
| - name: metal_operator_servers | ||
| interval: 30s | ||
| rules: | ||
| - alert: NoAvailableServers |
There was a problem hiding this comment.
Do we really want this rule? As the goal will be to use all available servers, this alert will always fire. @coderabbitai what do you think?
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
cat config/prometheus/server_alerts.yaml
Length of output: 2978
@afritzler, you raise a very valid point. The NoAvailableServers alert as currently defined will indeed fire whenever the fleet is fully utilized — which is the desired operational state. The alert's own description already betrays the ambiguity: it lists "Reserved" (healthy, in-use) alongside "Maintenance" and "Error" (genuinely problematic) as possible causes.
A few specific issues:
- False positives in healthy operation: When all servers are
Reserved, this alert fires, yet nothing is wrong. - Redundancy with
LowAvailableServerCapacity: That alert already fires at< 2available servers, which is a strict superset of the zero case.NoAvailableServersadds no new signal. - No distinction between "all used" vs. "all broken": The genuinely alarming scenario — zero available servers because they are all in Error/Maintenance — is already partially covered by
ServersInErrorState.
Suggested options:
-
Option A (Simplest): Remove
NoAvailableServersentirely and rely onLowAvailableServerCapacity+ServersInErrorStateto cover the relevant failure modes. -
Option B: Reformulate it to only fire when zero servers are available and zero are reserved (meaning the fleet isn't productively in use at all):
expr: | sum(metal_server_state{state="Available"} or on() vector(0)) < 1 AND sum(metal_server_state{state="Reserved"} or on() vector(0)) == 0
This would correctly distinguish between "full utilization" and "fleet is completely idle but nothing is available," which is the more actionable signal.
Option A is likely sufficient given the existing alert coverage, but Option B could be useful if operators specifically want to detect a completely idle and empty fleet.
There was a problem hiding this comment.
@stefanhipfel I think option B is actually what we want.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| - name: metal_operator_servers | ||
| interval: 30s | ||
| rules: | ||
| - alert: NoAvailableServers |
| @@ -0,0 +1,354 @@ | |||
| # Prometheus Metrics | |||
There was a problem hiding this comment.
Can we open top level in the docs a section for observability and have a subsection for metrics.
| ## References | ||
|
|
||
| - [Prometheus Best Practices](https://prometheus.io/docs/practices/naming/) | ||
| - [Kubernetes Monitoring Guide](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/) | ||
| - [Prometheus Operator Documentation](https://prometheus-operator.dev/) | ||
| - [Controller-Runtime Metrics](https://book.kubebuilder.io/reference/metrics.html) |
There was a problem hiding this comment.
I would remove those references here.
| }) | ||
|
|
||
| It("should count servers by power state", func() { | ||
| ctx := context.Background() |
| kubectl apply -f https://raw.githubusercontent.com/<org>/metal-operator/<tag or branch>/dist/install.yaml | ||
| ``` | ||
|
|
||
| ## Monitoring |
There was a problem hiding this comment.
I would keep the monitoring part out of the README and only rely on the information provided in the docs folder.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/.vitepress/config.mts (1)
88-94:⚠️ Potential issue | 🟡 MinorExpose the overview page from the new Observability group.
This PR adds
docs/observability/README.md, but the sidebar only links to/observability/metrics. That leaves the section landing page without a sidebar entry.🛠️ Suggested sidebar tweak
{ text: 'Observability', collapsed: false, items: [ + { text: 'Overview', link: '/observability/' }, { text: 'Metrics', link: '/observability/metrics' }, ] },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/.vitepress/config.mts` around lines 88 - 94, The Observability section in the sidebar object (the object with text 'Observability' and collapsed: false) currently only lists the Metrics item; add an entry for the overview/landing page by inserting an item with text 'Overview' (or 'Observability') and link '/observability/' (or '/observability') into the items array before the Metrics entry so the newly added docs/observability/README.md is reachable from the sidebar.docs/observability/metrics.md (1)
191-229:⚠️ Potential issue | 🟡 MinorMake the availability alerts zero-safe.
These expressions stop matching when
metal_server_state{state="Available"}is absent, so the example misses the exact “zero available servers” case.internal/metrics/server_metrics_test.goLines 80-91 already shows the collector can emit no series at all.🛠️ Suggested doc fix
- - alert: NoAvailableServers - expr: metal_server_state{state="Available"} == 0 + - alert: NoAvailableServers + expr: sum(metal_server_state{state="Available"} or on() vector(0)) < 1 @@ - - alert: LowAvailableServerCapacity - expr: metal_server_state{state="Available"} < 2 + - alert: LowAvailableServerCapacity + expr: sum(metal_server_state{state="Available"} or on() vector(0)) < 2In PromQL alerting, why does `metric{label="value"} == 0` not fire when the series is absent, and what is the recommended pattern to treat a missing series as zero?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/observability/metrics.md` around lines 191 - 229, The alerts that check metal_server_state{state="Available"} (e.g., NoAvailableServers and LowAvailableServerCapacity) are not zero-safe because the expression stops matching when the series is absent; update those rules to treat a missing series as zero by OR-ing with absent(...) (for example, change the expr to something like: (metal_server_state{state="Available"} == 0) or absent(metal_server_state{state="Available"})) so the alert fires when the metric is missing as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/metrics/server_metrics_test.go`:
- Around line 259-275: The test currently only asserts no panic when calling
collector.Collect, but must also verify that no metric series is emitted with
empty state labels; after calling collector.Collect(ch) and closing ch, drain
the channel and inspect each prometheus.Metric produced by the collector (using
the existing ch variable and collector.Collect call) to convert metrics to their
dto.MetricFamily or use prometheus/testutil.ToFloat64 where appropriate, then
assert that none of the metrics contain label pairs with Name "state" or
"power_state" with an empty Value (i.e., ensure label.Value != ""). Keep the
existing created server (server with empty Status) and add expectations that no
metric has state=="" and no metric has power_state=="" to enforce bounded-label
behavior.
---
Duplicate comments:
In `@docs/.vitepress/config.mts`:
- Around line 88-94: The Observability section in the sidebar object (the object
with text 'Observability' and collapsed: false) currently only lists the Metrics
item; add an entry for the overview/landing page by inserting an item with text
'Overview' (or 'Observability') and link '/observability/' (or '/observability')
into the items array before the Metrics entry so the newly added
docs/observability/README.md is reachable from the sidebar.
In `@docs/observability/metrics.md`:
- Around line 191-229: The alerts that check
metal_server_state{state="Available"} (e.g., NoAvailableServers and
LowAvailableServerCapacity) are not zero-safe because the expression stops
matching when the series is absent; update those rules to treat a missing series
as zero by OR-ing with absent(...) (for example, change the expr to something
like: (metal_server_state{state="Available"} == 0) or
absent(metal_server_state{state="Available"})) so the alert fires when the
metric is missing as well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 38ce2386-8ddb-4088-9dc2-4527a612e560
📒 Files selected for processing (6)
config/prometheus/server_alerts.yamldocs/.vitepress/config.mtsdocs/observability/README.mddocs/observability/metrics.mdgo.modinternal/metrics/server_metrics_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- config/prometheus/server_alerts.yaml
- go.mod
| It("should handle servers without state gracefully", func(ctx SpecContext) { | ||
| // Create server without state set | ||
| server := metalv1alpha1.Server{ | ||
| ObjectMeta: metav1.ObjectMeta{Name: "server1"}, | ||
| Spec: metalv1alpha1.ServerSpec{SystemUUID: "uuid1"}, | ||
| Status: metalv1alpha1.ServerStatus{}, | ||
| } | ||
|
|
||
| Expect(fakeClient.Create(ctx, &server)).To(Succeed()) | ||
|
|
||
| // Collect metrics - should not panic | ||
| ch := make(chan prometheus.Metric, 100) | ||
| Expect(func() { | ||
| collector.Collect(ch) | ||
| close(ch) | ||
| }).NotTo(Panic()) | ||
| }) |
There was a problem hiding this comment.
Assert the empty-label case, not just the panic case.
This spec still passes if the collector starts exporting state="" or power_state="". Please inspect the collected metrics and assert that empty status fields do not create any series, otherwise the bounded-label behavior stays untested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/metrics/server_metrics_test.go` around lines 259 - 275, The test
currently only asserts no panic when calling collector.Collect, but must also
verify that no metric series is emitted with empty state labels; after calling
collector.Collect(ch) and closing ch, drain the channel and inspect each
prometheus.Metric produced by the collector (using the existing ch variable and
collector.Collect call) to convert metrics to their dto.MetricFamily or use
prometheus/testutil.ToFloat64 where appropriate, then assert that none of the
metrics contain label pairs with Name "state" or "power_state" with an empty
Value (i.e., ensure label.Value != ""). Keep the existing created server (server
with empty Status) and add expectations that no metric has state=="" and no
metric has power_state=="" to enforce bounded-label behavior.
xkonni
left a comment
There was a problem hiding this comment.
approved with minor question
| interval: 30s | ||
| rules: | ||
| - alert: NoAvailableServers | ||
| expr: sum(metal_server_state{state="Available"} or on() vector(0)) < 1 AND sum(metal_server_state{state="Reserved"} or on() vector(0)) == 0 |
There was a problem hiding this comment.
why < 1 on the first and == 0 on the second?
There was a problem hiding this comment.
it is the same in the end
all required changes were implemented.
* Add Prometheus metrics and alerts for server monitoring * Enhance Prometheus alerts and metrics for server monitoring - Update alert expressions for NoAvailableServers and LowAvailableServerCapacity to use sum for better accuracy. - Add new alert for missing server metrics. - Improve documentation for server condition status and example values. - Implement a timeout for metrics collection to prevent hanging scrapes. * Address PR #718 review comments
…tory function (#733) * Implement CheckBMCPendingComponentUpgrade for vendor-specific handling and integrate checks in BIOS and BMC upgrade processes * Remove duplicate linter exclusions for 'dupl' and 'lll' in bmc path * Fix linter exclusions for 'dupl' and 'lll' in bmc path * Refactor CheckBMCPendingComponentUpgrade to use ComponentType for type safety and improve clarity * Enhance CheckBMCPendingComponentUpgrade to validate component types and clarify behavior for local providers. Add better comment for AI BOT * add missing chart selector labels to registry service (#726) * Add Prometheus metrics and alerts for server monitoring (#718) * Add Prometheus metrics and alerts for server monitoring * Enhance Prometheus alerts and metrics for server monitoring - Update alert expressions for NoAvailableServers and LowAvailableServerCapacity to use sum for better accuracy. - Add new alert for missing server metrics. - Improve documentation for server condition status and example values. - Implement a timeout for metrics collection to prevent hanging scrapes. * Address PR #718 review comments * Add Redfish event subscription management for servers (#494) - Implement event and metrics subscription handling - Add subscription links to ServerStatus - Integrate with BMC interface for CreateEventSubscription/DeleteEventSubscription - Add proper error handling and logging - Include SPDX license headers - Fix lint issues and improve code quality This change enables automatic subscription to Redfish events and metrics reports for managed servers, improving observability and monitoring capabilities. * Bump docker/setup-buildx-action from 3 to 4 (#732) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v3...v4) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/setup-qemu-action from 3 to 4 (#731) Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](docker/setup-qemu-action@v3...v4) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/metadata-action from 5 to 6 (#730) Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](docker/metadata-action@v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump golang from 1.26.0 to 1.26.1 (#729) Bumps golang from 1.26.0 to 1.26.1. --- updated-dependencies: - dependency-name: golang dependency-version: 1.26.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/build-push-action from 6 to 7 (#728) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump dompurify from 3.2.6 to 3.3.3 (#734) Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.2.6 to 3.3.3. - [Release notes](https://github.com/cure53/DOMPurify/releases) - [Commits](cure53/DOMPurify@3.2.6...3.3.3) --- updated-dependencies: - dependency-name: dompurify dependency-version: 3.3.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker/login-action from 3 to 4 (#727) Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](docker/login-action@v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Tidy go.mod (#736) * Bump golang.org/x/crypto from 0.48.0 to 0.49.0 (#738) Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.48.0 to 0.49.0. - [Commits](golang/crypto@v0.48.0...v0.49.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-version: 0.49.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Refactor CheckBMCPendingComponentUpgrade methods to simplify signature and clarify vendor-specific behavior for redfish kube and local. Adjust the comment in redfish.go to explain what the function CheckBMCPendingComponentUpgrade is doing --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Alan Sergeant <alan.sergeant@sap.com> Co-authored-by: Stefan Hipfel <stefan.hipfel@sap.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andreas Fritzler <andreas.fritzler@sap.com>
Proposed Changes
Add Prometheus metrics and alerts for server monitoring
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
Chores