Skip to content

scheduling_group: expose usage statistics#3396

Open
travisdowns wants to merge 1 commit into
scylladb:masterfrom
travisdowns:td-upstream-sg-get-stats
Open

scheduling_group: expose usage statistics#3396
travisdowns wants to merge 1 commit into
scylladb:masterfrom
travisdowns:td-upstream-sg-get-stats

Conversation

@travisdowns

Copy link
Copy Markdown
Contributor

Extends the public interface of scheduling_group to expose its task
queue usage statistics — runtime, waittime, and starvetime — via
a new get_stats() accessor returning a scheduling_group::stats
struct. The underlying counters are already maintained on the per-shard
task queue; this just makes them readable from outside the reactor.

This patch has been carried in the Redpanda fork since 2022. We have
two production uses motivating upstreaming:

  1. Per-scheduling-group Prometheus metric. Redpanda exports a
    scheduler_runtime_seconds_total counter with one series per
    scheduling group, sampled from get_stats().runtime in the metric
    callback — visibility into how reactor time is split between SGs,
    useful for capacity planning and noisy-neighbor diagnosis.

  2. Fetch PID controller. A PID loop samples the Kafka fetch
    scheduling group's runtime to compute its utilization fraction
    (delta runtime / delta wallclock), combines it with overall reactor
    busy time, and outputs a per-fetch debounce delay that keeps the
    fetch SG within a configured share of the reactor.

Original: redpanda-data@a618fe8

This commit extends the public interface of scheduling_group to expose
usage statistics (e.g. runtime).

Downstream uses in Redpanda:

  1. Per-scheduling-group Prometheus metric. A public counter
     `scheduler_runtime_seconds_total` is exported with one series per
     scheduling group, sampled from `get_stats().runtime` in the metric
     callback. This gives operators visibility into how reactor time is
     being divided between scheduling groups.

  2. Fetch PID controller. A PID loop samples the fetch scheduling
     group's `runtime` to compute its utilization fraction (delta
     runtime / delta wallclock), combines it with overall reactor busy
     time, and outputs a per-fetch debounce delay used to keep the
     fetch SG within a configured share of the reactor.
friend unsigned internal::scheduling_group_index(scheduling_group sg) noexcept;
friend scheduling_group internal::scheduling_group_from_index(unsigned index) noexcept;

struct stats {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen doc for public API is missing

@xemul

xemul commented May 12, 2026

Copy link
Copy Markdown
Contributor

Per-scheduling-group Prometheus metric. A public counter
scheduler_runtime_seconds_total is exported with one series per
scheduling group, sampled from get_stats().runtime in the metric
callback. This gives operators visibility into how reactor time is
being divided between scheduling groups.

But reactor already registers metrics that export runtime, waittime and starvetime

@travisdowns

Copy link
Copy Markdown
Contributor Author

But reactor already registers metrics that export runtime, waittime and starvetime

It's true, but we export runtime again on a different metrics endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants