Skip to content

Conversation

@xemul
Copy link
Contributor

@xemul xemul commented Jan 21, 2026

Currently there's a global limit on the number of scheduling groups (not supergroups), regardless of where those groups are -- in the root super group or not.

This maximum number has implicit dependency from Scylla service levels. When compiling, Scylla defines some maximum number of groups, and when running it creates a fixed amount of pre-defined groups. The remainder is thus implicit limit on the number of dynamically created SL groups.

There's an ongoing work to move all SL groups into their own supergroup (scylladb/scylladb#28235) and later to split streaming/maintenance group into sub-groups. The latter part will affect the number of pre-defined groups and will, thus, touch that implicit limit on the number of SL groups. Since this number is already the part of top-level user API, breaking it won't be good, so likely the compile-time limit on the number of groups will be changed.

By and large, all those manipulations are fragile and require extra care. This patch will help to pin the SL groups count contract by imposing an explicit limit on the supergroup with SL groups.

Currently there's a global limit on the number of scheduling groups (not
supergroups), regardless of where those groups are -- in the root super
group or not.

This maximum number has implicit dependency from Scylla service levels.
When compiling, Scylla defines some maximum number of groups, and when
running it creates a fixed amount of pre-defined groups. The remainder
is thus implicit limit on the number of dynamically created SL groups.

There's an ongoing work to move all SL groups into their own supergroup
(scylladb/scylladb#28235) and later to split streaming/maintenance group
into sub-groups. The latter part will affect the number of pre-defined
groups and will, thus, touch that implicit limit on the number of SL
groups. Since this number is already the part of top-level user API,
breaking it won't be good, so likely the compile-time limit on the
number of groups will be changed.

By and large, all those manipulations are fragile and require extra
care. This patch will help to pin the SL groups count contract by
imposing an explicit limit on the supergroup with SL groups.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces per-supergroup limits on the number of child scheduling groups, replacing the previous global limit. This change enables better control over service level (SL) groups by allowing explicit limits per supergroup.

Changes:

  • Added _max_children member to task_queue_group to track per-supergroup limits, initialized to max_scheduling_groups()
  • Introduced set_maximum_subgroups() API function to configure limits for specific supergroups across all shards
  • Updated limit checks to use per-group _max_children instead of global max_scheduling_groups()

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
include/seastar/core/reactor.hh Added _max_children member and set_maximum_children() method to task_queue_group, added friend declaration for new API
include/seastar/core/scheduling.hh Declared new set_maximum_subgroups() function with documentation
src/core/reactor.cc Implemented set_maximum_children() and set_maximum_subgroups(), updated limit checks to use per-group limits
tests/unit/scheduling_group_nesting_test.cc Added comprehensive test coverage for the new maximum subgroups functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if (_parent != nullptr) {
if (_parent->_nr_children >= max_scheduling_groups()) {
if (_parent->_nr_children >= _parent->_max_children) {
on_fatal_internal_error(seastar_logger, "Attempted to create too many supergroups");
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "Attempted to create too many supergroups" is misleading. The sched_entity constructor is called by both task_queue_group (supergroups) and task_queue (regular scheduling groups), so this error can occur when creating either type of child. The message should be more generic, such as "Attempted to create too many children in scheduling group" or "Scheduling group children limit exceeded".

Suggested change
on_fatal_internal_error(seastar_logger, "Attempted to create too many supergroups");
on_fatal_internal_error(seastar_logger, "Scheduling group children limit exceeded");

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but it's a pre-existing problem. Will improve it with separate PR later

@avikivity
Copy link
Member

care. This patch will help to pin the SL groups count contract by

Maybe this maximum is creating more trouble than it is worth.

It helps avoid an indirection, but with nested scheduling groups, we re-added it.

@xemul
Copy link
Contributor Author

xemul commented Jan 22, 2026

By "more trouble" do you mean more code and API to maintain, or something else?

For the problem it's supposed to solve -- explicitly pin the number of Scylla SL-s -- it makes the implicit management from Scylla side be explicit on Seastar side.

@avikivity
Copy link
Member

By "more trouble" do you mean more code and API to maintain, or something else?

More maintenance, plus the unpleasantness of having to configure the maximum count.

For the problem it's supposed to solve -- explicitly pin the number of Scylla SL-s -- it makes the implicit management from Scylla side be explicit on Seastar side.

With nested scheduling, it's not clear how the limit on the total number of groups helps Seastar.

@xemul
Copy link
Contributor Author

xemul commented Jan 22, 2026

... plus the unpleasantness of having to configure the maximum count.

This is optional (forgot to mention it in the commit). If user doesn't apply the limit, a supergroup is implicitly limited with max-sched-groups limit

@xemul
Copy link
Contributor Author

xemul commented Jan 22, 2026

With nested scheduling, it's not clear how the limit on the total number of groups helps Seastar.

We can remove it eventually.

Right now, this "global" limit affects wakeup (all task-queues, regardless of their nesting, as collected in reactor::_task_queues array and reactor::add_task() dereferences a queue for a task from it), and network statistics (there's global bytes_sent and packets_sent for native and posix stacks)

Patching wakeup should be done with care, right now it's two pointer fetches (reactor, then queue), just converting array to vector will make it three

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants