Skip to content

pkg/services/otelhealth: fix uptime_seconds metric type from Gauge to Counter#1847

Open
pkcll wants to merge 1 commit intomainfrom
fix/otelhealth-uptime-counter
Open

pkg/services/otelhealth: fix uptime_seconds metric type from Gauge to Counter#1847
pkcll wants to merge 1 commit intomainfrom
fix/otelhealth-uptime-counter

Conversation

@pkcll
Copy link
Contributor

@pkcll pkcll commented Feb 20, 2026

Summary

  • Change uptime_seconds OTel metric from Float64Gauge to Float64Counter and use Add() instead of Record()
  • The Prometheus implementation (promhealth) already correctly uses a Counter — this aligns the OTel implementation to match
  • As a Gauge, the metric was recording the fixed interval duration (15s) on every tick instead of accumulating total uptime

@pkcll pkcll deployed to integration February 20, 2026 02:42 — with GitHub Actions Active
@pkcll pkcll marked this pull request as ready for review February 20, 2026 02:42
@pkcll pkcll requested a review from a team as a code owner February 20, 2026 02:42
Copilot AI review requested due to automatic review settings February 20, 2026 02:42
@github-actions
Copy link

✅ API Diff Results - No breaking changes


📄 View full apidiff report

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns the OpenTelemetry health-check uptime metric with the existing Prometheus implementation by reporting total accumulated uptime rather than repeatedly recording the fixed polling interval.

Changes:

  • Switch uptime_seconds from an OTel Float64Gauge to Float64Counter.
  • Update reporting from Record() to Add() so uptime accumulates over time.
  • Clarify the metric description to indicate it is measured in seconds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return services.HealthCheckerConfig{}, err
}
uptimeSeconds, err := meter.Float64Gauge("uptime_seconds", metric.WithDescription("Uptime of the service"))
uptimeSeconds, err := meter.Float64Counter("uptime_seconds", metric.WithDescription("Uptime of the service in seconds"))
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uptime_seconds is now a Counter and the description says "Uptime of the service in seconds", but it doesn’t indicate the value is cumulative/monotonic (i.e., total uptime accumulated since start). Consider updating the description to explicitly reflect counter semantics to avoid confusion in dashboards/alerts.

Also, this metric represents seconds; most other OTel duration metrics in this repo set an explicit unit (e.g. metric.WithUnit("s") in pkg/settings/limits/time.go). Consider adding metric.WithUnit("s") here as well for consistency and better backend rendering.

Suggested change
uptimeSeconds, err := meter.Float64Counter("uptime_seconds", metric.WithDescription("Uptime of the service in seconds"))
uptimeSeconds, err := meter.Float64Counter(
"uptime_seconds",
metric.WithDescription("Total cumulative uptime of the service in seconds since start"),
metric.WithUnit("s"),
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments