scheduling_group: expose usage statistics#3396
Open
travisdowns wants to merge 1 commit into
Open
Conversation
This commit extends the public interface of scheduling_group to expose
usage statistics (e.g. runtime).
Downstream uses in Redpanda:
1. Per-scheduling-group Prometheus metric. A public counter
`scheduler_runtime_seconds_total` is exported with one series per
scheduling group, sampled from `get_stats().runtime` in the metric
callback. This gives operators visibility into how reactor time is
being divided between scheduling groups.
2. Fetch PID controller. A PID loop samples the fetch scheduling
group's `runtime` to compute its utilization fraction (delta
runtime / delta wallclock), combines it with overall reactor busy
time, and outputs a per-fetch debounce delay used to keep the
fetch SG within a configured share of the reactor.
xemul
reviewed
May 12, 2026
| friend unsigned internal::scheduling_group_index(scheduling_group sg) noexcept; | ||
| friend scheduling_group internal::scheduling_group_from_index(unsigned index) noexcept; | ||
|
|
||
| struct stats { |
Contributor
There was a problem hiding this comment.
Doxygen doc for public API is missing
Contributor
But reactor already registers metrics that export runtime, waittime and starvetime |
Contributor
Author
It's true, but we export runtime again on a different metrics endpoint. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends the public interface of
scheduling_groupto expose its taskqueue usage statistics —
runtime,waittime, andstarvetime— viaa new
get_stats()accessor returning ascheduling_group::statsstruct. The underlying counters are already maintained on the per-shard
task queue; this just makes them readable from outside the reactor.
This patch has been carried in the Redpanda fork since 2022. We have
two production uses motivating upstreaming:
Per-scheduling-group Prometheus metric. Redpanda exports a
scheduler_runtime_seconds_totalcounter with one series perscheduling group, sampled from
get_stats().runtimein the metriccallback — visibility into how reactor time is split between SGs,
useful for capacity planning and noisy-neighbor diagnosis.
Fetch PID controller. A PID loop samples the Kafka fetch
scheduling group's
runtimeto compute its utilization fraction(delta runtime / delta wallclock), combines it with overall reactor
busy time, and outputs a per-fetch debounce delay that keeps the
fetch SG within a configured share of the reactor.
Original: redpanda-data@a618fe8