-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Problem
We run Trino at scale with shared clusters serving multiple teams — analysts running ad-hoc queries, ETL pipelines, dashboards, etc. We use resource groups heavily to isolate these workloads from each other. Over time we've hit a few gaps in what resource groups can actually control today, and I wanted to describe them here before submitting a PR.
1. No way to limit total drivers within a resource group
This is the biggest one for us. Right now resource groups can limit concurrency (number of running queries) and memory, but there's nothing that limits the total number of drivers across all queries in a group.
In practice, this means a resource group with hardConcurrencyLimit: 5 could have 5 queries each with 200+ drivers, completely saturating task slots cluster-wide. Meanwhile other resource groups with lighter queries get starved because all the task slots are occupied.
We want to be able to say: "this resource group can have at most N total active drivers (running + queued + blocked) across all its queries". When the limit is reached, new queries queue instead of starting. This is a fundamentally different axis of control than query count or memory.
2. Per-query limits should be configurable at the resource group level
Trino has query.max-memory, query.max-cpu-time, and query.max-scan-physical-bytes as session/system properties. These work fine as global defaults, but when you want different per-query limits for different workloads, you need a session property manager to map users/sources to different values. That's a separate plugin with its own config, and in practice it's awkward to keep synchronized with resource group definitions.
For example, we want:
Ad-hoc group: no single query should use more than 10GB memory or scan more than 500GB
ETL group: queries can use up to 50GB memory and scan up to 5TB
Dashboard group: queries limited to 2GB memory and 30s CPU time
Today you'd configure these limits via a session property manager, referencing the same user/source patterns that your resource group selectors already match. It's duplicated config that has to stay in sync. It would be much more natural to just put these limits directly in the resource group definition:
{
"name": "adhoc",
"softMemoryLimit": "80%",
"hardConcurrencyLimit": 20,
"maxQueued": 100,
"perQueryMemoryLimit": "10GB",
"perQueryScanLimit": "500GB",
"perQueryCpuLimit": "5m"
}
The enforcement semantics are the same — queries exceeding the limit get killed
I know there's overlap with session properties here. The way I think about it: session properties are great for global defaults and user-level overrides. Resource group per-query limits are about workload-level policy. They complement each other. A global query.max-cpu-time of 1 hour is a safety net; a resource group perQueryCpuLimit of 5 minutes for the dashboard group is workload policy.
3. Planning concurrency control
This one is more niche. When a burst of queries hits a resource group, they all get admitted and enter PLANNING/STARTING state simultaneously. Planning is coordinator-bound (metadata fetches, optimization, etc.) and having too many queries planning at once can spike coordinator CPU and slow down planning for everyone.
A hardPlanningConcurrencyLimit lets you say "at most N queries in this group can be in PLANNING or STARTING state at any time". It's a soft knob — the state is checked periodically so it's not precise — but it helps smooth out bursts.
Proposed changes
Add perQueryMemoryLimit, perQueryCpuLimit, perQueryScanLimit to resource group config
— enforced in stageResourceUsage() during periodic resource usage updates, same cadence as existing group-level limits
Add hardTotalDriverLimit
— checked in canRunMore(), aggregated hierarchically across subgroups
Add hardPlanningConcurrencyLimit
— checked in canRunMore(), counts queries in PLANNING/STARTING state
All new limits default to effectively unlimited (MAX_VALUE)
I have a working implementation with tests ready.
Would love to hear thoughts on whether this direction makes sense. i have opened PR here