-
Notifications
You must be signed in to change notification settings - Fork 1.7k
scheduling: Introduce per supergroup children count #3212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Currently there's a global limit on the number of scheduling groups (not supergroups), regardless of where those groups are -- in the root super group or not. This maximum number has implicit dependency from Scylla service levels. When compiling, Scylla defines some maximum number of groups, and when running it creates a fixed amount of pre-defined groups. The remainder is thus implicit limit on the number of dynamically created SL groups. There's an ongoing work to move all SL groups into their own supergroup (scylladb/scylladb#28235) and later to split streaming/maintenance group into sub-groups. The latter part will affect the number of pre-defined groups and will, thus, touch that implicit limit on the number of SL groups. Since this number is already the part of top-level user API, breaking it won't be good, so likely the compile-time limit on the number of groups will be changed. By and large, all those manipulations are fragile and require extra care. This patch will help to pin the SL groups count contract by imposing an explicit limit on the supergroup with SL groups. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces per-supergroup limits on the number of child scheduling groups, replacing the previous global limit. This change enables better control over service level (SL) groups by allowing explicit limits per supergroup.
Changes:
- Added
_max_childrenmember totask_queue_groupto track per-supergroup limits, initialized tomax_scheduling_groups() - Introduced
set_maximum_subgroups()API function to configure limits for specific supergroups across all shards - Updated limit checks to use per-group
_max_childreninstead of globalmax_scheduling_groups()
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| include/seastar/core/reactor.hh | Added _max_children member and set_maximum_children() method to task_queue_group, added friend declaration for new API |
| include/seastar/core/scheduling.hh | Declared new set_maximum_subgroups() function with documentation |
| src/core/reactor.cc | Implemented set_maximum_children() and set_maximum_subgroups(), updated limit checks to use per-group limits |
| tests/unit/scheduling_group_nesting_test.cc | Added comprehensive test coverage for the new maximum subgroups functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (_parent != nullptr) { | ||
| if (_parent->_nr_children >= max_scheduling_groups()) { | ||
| if (_parent->_nr_children >= _parent->_max_children) { | ||
| on_fatal_internal_error(seastar_logger, "Attempted to create too many supergroups"); |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "Attempted to create too many supergroups" is misleading. The sched_entity constructor is called by both task_queue_group (supergroups) and task_queue (regular scheduling groups), so this error can occur when creating either type of child. The message should be more generic, such as "Attempted to create too many children in scheduling group" or "Scheduling group children limit exceeded".
| on_fatal_internal_error(seastar_logger, "Attempted to create too many supergroups"); | |
| on_fatal_internal_error(seastar_logger, "Scheduling group children limit exceeded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, but it's a pre-existing problem. Will improve it with separate PR later
Maybe this maximum is creating more trouble than it is worth. It helps avoid an indirection, but with nested scheduling groups, we re-added it. |
|
By "more trouble" do you mean more code and API to maintain, or something else? For the problem it's supposed to solve -- explicitly pin the number of Scylla SL-s -- it makes the implicit management from Scylla side be explicit on Seastar side. |
More maintenance, plus the unpleasantness of having to configure the maximum count.
With nested scheduling, it's not clear how the limit on the total number of groups helps Seastar. |
This is optional (forgot to mention it in the commit). If user doesn't apply the limit, a supergroup is implicitly limited with max-sched-groups limit |
We can remove it eventually. Right now, this "global" limit affects wakeup (all task-queues, regardless of their nesting, as collected in Patching wakeup should be done with care, right now it's two pointer fetches (reactor, then queue), just converting array to vector will make it three |
Currently there's a global limit on the number of scheduling groups (not supergroups), regardless of where those groups are -- in the root super group or not.
This maximum number has implicit dependency from Scylla service levels. When compiling, Scylla defines some maximum number of groups, and when running it creates a fixed amount of pre-defined groups. The remainder is thus implicit limit on the number of dynamically created SL groups.
There's an ongoing work to move all SL groups into their own supergroup (scylladb/scylladb#28235) and later to split streaming/maintenance group into sub-groups. The latter part will affect the number of pre-defined groups and will, thus, touch that implicit limit on the number of SL groups. Since this number is already the part of top-level user API, breaking it won't be good, so likely the compile-time limit on the number of groups will be changed.
By and large, all those manipulations are fragile and require extra care. This patch will help to pin the SL groups count contract by imposing an explicit limit on the supergroup with SL groups.