Skip to content

[RFC]resource group isolation#123

Open
mittalrishabh wants to merge 3 commits intotikv:masterfrom
mittalrishabh:resource-group-isolation
Open

[RFC]resource group isolation#123
mittalrishabh wants to merge 3 commits intotikv:masterfrom
mittalrishabh:resource-group-isolation

Conversation

@mittalrishabh
Copy link
Member

@mittalrishabh mittalrishabh commented Jan 23, 2026

created RFC for resource group isolation.

Signed-off-by: rishabh_mittal <rishabh.mittal@airbnb.com>
@mittalrishabh mittalrishabh changed the title resource isolation resource group isolation Jan 23, 2026
@mittalrishabh mittalrishabh changed the title resource group isolation [RFC]resource group isolation Jan 23, 2026
@@ -0,0 +1,273 @@
# Design Doc: Fair Scheduling Based on Historical RU Consumption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the RFC is still tuning the priority of requests and impact the schedule of the read pool, I don't see the throttling mechanism.

Essentially I think we should have a instance level (read pool) self protection throttling mechanism to prevent any requests from any tenants/resource group making the instance overloaded. tikv/tikv#19319 has a similar goal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not throttled via rate limiting. Instead, throttling happens through queue eviction when the queue is full or slow scheduling. The goal is to protect sustained traffic by deprioritizing the traffic that is causing the overload.
tikv/tikv#19319 is penalizing the tenants based on their rate limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider this scenario
Steady state:

  • Tenant_1: consuming 10000 RU/s (sustained workload)
  • Tenant_2: consuming 20000 RU/s (sustained workload)
  • System: stable

Sudden spike:

  • Tenant_3: traffic suddenly increases to 5000, overloading the system

Expected:

  • Throttle Tenant_3 (the new traffic causing overload)
  • Protect Tenant_1 and Tenant_2 (sustained traffic)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants