feat(database_observability.mysql): add database_observability_wait_event_seconds_total counter#6106
Open
feat(database_observability.mysql): add database_observability_wait_event_seconds_total counter#6106
Conversation
…vent_seconds_total counter Adds a Prometheus counter that pre-aggregates wait time per query digest and database schema. The counter is emitted unconditionally alongside each wait event log entry, regardless of which op version is active. Labels: server_id (curried at component level), digest, schema. Exposed at the component's /metrics endpoint via the existing registry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
database_observability_wait_event_seconds_totalPrometheus counter to the MySQL component that pre-aggregates wait time per query digest and database schema.This counter provides an alternative backend for wide-window TotalWaitEventsTime queries, bypassing the Loki 500-series fallback loop entirely. Benchmarks show −52 to −65% p50 at 1h30+ unfiltered windows where Loki fallback adds 2–7s of retry overhead.
Details
Metric:
database_observability_wait_event_seconds_total(counter)Labels:
server_id(curried at component level),digest,schemaEmitted: unconditionally on every wait event row, regardless of which Loki op version is active
Exposed: at the component's
/metricsendpoint via the existing registryserver_idis curried at the component level usingCounterVec.CurryWith, so the collector only needsdigestandschemalabel values.Changes
waitEventCounter *prometheus.CounterVecfield onComponent(lifecycle mirrorsexporterCollector)CurryWithinstartCollectorsWaitEventCounter *prometheus.CounterVecpassed toQuerySamplescollectorTestQuerySamples_WaitEventCounterunit testTest plan
TestQuerySamples_WaitEventCounterpasses/metricson a running instanceserver_id,digest,schemapresent on each series🤖 Generated with Claude Code