Skip to content

proposal: Multi-Cluster Queue Management#7485

Open
shellfish007 wants to merge 3 commits into
karmada-io:masterfrom
shellfish007:multi-cluster-queue-management
Open

proposal: Multi-Cluster Queue Management#7485
shellfish007 wants to merge 3 commits into
karmada-io:masterfrom
shellfish007:multi-cluster-queue-management

Conversation

@shellfish007
Copy link
Copy Markdown

@shellfish007 shellfish007 commented May 8, 2026

Summary

This PR proposes opt-in per-tenant queue sharding for Karmada's existing scheduler queue system, enabling multi-tenant isolation without introducing new heavyweight abstractions.

Karmada's scheduler maintains three internal queues (activeQ, backoffQ, unschedulableBindings) as global singletons. Namespaces that create a TenantQueue get their own isolated set of queues; namespaces without one continue to share a global default queue (backward compatible).

API

TenantQueue is namespace-scoped with a singleton name queue. A validating webhook rejects objects with any other name.

apiVersion: scheduling.karmada.io/v1alpha1
kind: TenantQueue
metadata:
  name: queue
  namespace: team-a
spec:
  queueingStrategy: StrictFIFO  # or BestEffortFIFO (default)

Scheduler Changes

The scheduler maintains a TenantSchedulingQueue wrapping multiple prioritySchedulingQueue instances, one per namespace:

TenantSchedulingQueue
  ├── "team-a"    → prioritySchedulingQueue{activeQ, backoffQ, unschedulableBindings} [StrictFIFO]
  ├── "team-b"    → prioritySchedulingQueue{...} [BestEffortFIFO]
  └── __default__ → prioritySchedulingQueue{...}

Pop() uses round-robin across tenant queues for fair scheduling. Bindings are ordered by priority descending, then enqueue timestamp ascending.

Key Points

  • Supports BestEffortFIFO (skip unschedulable head, try next) and StrictFIFO (head-of-line blocking per tenant)
  • Backwards compatible: feature gate TenantQueueManagement (alpha, disabled by default)
  • Singleton name queue enforced by validating webhook

Non-Goals

  • Changes to the backoffQ or unschedulableBindings data structures themselves
  • Per-tenant backoff and unschedulable timeout tuning
  • Weighted round-robin (planned for a future phase)

Related

Copilot AI review requested due to automatic review settings May 8, 2026 02:08
@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chaunceyjiang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a design proposal for implementing per-tenant queue sharding within the Karmada scheduler. By shifting from global queues to namespace-scoped queues, the system aims to improve multi-tenant isolation, prevent burst monopolization, and allow for configurable ordering strategies like StrictFIFO. The proposal outlines the API design, scheduling logic, and a phased implementation plan to ensure stability and backward compatibility.

Highlights

  • Proposal Introduction: Introduced a new design proposal for Multi-Cluster Queue Management in Karmada to enable per-tenant scheduling isolation.
  • TenantQueue API: Defined a new namespace-scoped TenantQueue API (scheduling.karmada.io/v1alpha1) to allow per-namespace queue configuration.
  • Queue Sharding: Proposed refactoring the scheduler to support per-tenant queue sharding, moving away from global singleton queues.
  • Scheduling Strategies: Introduced BestEffortFIFO and StrictFIFO ordering modes to provide flexibility for different workload requirements.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot
Copy link
Copy Markdown
Contributor

Welcome @shellfish007! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 8, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-tenant queue sharding to Karmada's scheduler through a new namespace-scoped TenantQueue API, supporting both BestEffortFIFO and StrictFIFO strategies to improve multi-tenant isolation. Feedback includes requests to document exported types and constants per the repository style guide. Additionally, suggestions were made to improve isolation for default tenants through automatic sharding and to simplify resource management by using a singleton name for TenantQueue objects instead of relying on validation webhooks.

QueueingStrategy QueueingStrategy `json:"queueingStrategy,omitempty"`
}

type QueueingStrategy string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the repository style guide (line 7) and standard Go best practices, all exported types should be documented. Please add a concise comment describing the purpose of QueueingStrategy.

Suggested change
type QueueingStrategy string
// QueueingStrategy defines the strategy for ordering and blocking bindings in the active queue.\ntype QueueingStrategy string
References
  1. All exported functions, methods, structs, and interfaces must be documented with clear and concise comments describing their purpose and behavior. (link)

Comment on lines +93 to +95
BestEffortFIFO QueueingStrategy = "BestEffortFIFO"
StrictFIFO QueueingStrategy = "StrictFIFO"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Exported constants should be documented to adhere to the repository style guide (line 7) and standard Go best practices.

Suggested change
BestEffortFIFO QueueingStrategy = "BestEffortFIFO"
StrictFIFO QueueingStrategy = "StrictFIFO"
)
// BestEffortFIFO indicates that if the head binding fails, the next one is tried.\n BestEffortFIFO QueueingStrategy = "BestEffortFIFO"\n // StrictFIFO indicates that if the head binding fails, the entire queue is blocked.\n StrictFIFO QueueingStrategy = "StrictFIFO"\n)
References
  1. All exported functions, methods, structs, and interfaces must be documented with clear and concise comments describing their purpose and behavior. (link)

# Another namespace uses the default (BestEffortFIFO), no TenantQueue needed
```

Namespaces without a `TenantQueue` — as well as all `ClusterResourceBinding` objects (which have no namespace) — are routed to a built-in `__default__` queue that always uses `BestEffortFIFO`. The default queue participates in the same round-robin as named tenant queues, getting one scheduling turn per cycle.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposal states that all namespaces without a TenantQueue are routed to a single __default__ queue. This design does not provide isolation between these 'default' tenants, which may conflict with the goal of 'per-tenant isolation'. \n\nConsider automatically sharding by namespace name by default for all ResourceBinding objects. The TenantQueue resource would then serve as an optional configuration for these per-namespace shards, rather than a prerequisite for isolation.

### Phase 3: Stabilization (Beta)

1. Promote `TenantQueue` API to `v1beta1`.
2. Add validation webhooks (reject multiple `TenantQueue` objects per namespace).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of implementing a validation webhook to restrict the number of TenantQueue objects per namespace, it is more idiomatic in Kubernetes to enforce a singleton name (such as default) for the resource. This simplifies discovery and avoids the need for cross-object validation logic.

@shellfish007 shellfish007 force-pushed the multi-cluster-queue-management branch from c3677e5 to ca1ecca Compare May 8, 2026 02:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new scheduling proposal describing “Multi-Cluster Queue Management” via per-namespace (tenant) sharded scheduler queues, aiming to improve multi-tenant isolation and fairness without introducing heavier queue abstractions.

Changes:

  • Introduces a new proposal document for per-tenant queue sharding in the scheduler (active/backoff/unschedulable queues).
  • Specifies a new namespaced TenantQueue API concept with BestEffortFIFO and StrictFIFO modes.
  • Describes a Kueue-inspired “heads” collection pattern for cross-tenant fairness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +33
- **`unschedulableBindings`** — bindings that could not be scheduled and are awaiting a cluster state change

Today these three queues are global singletons. This proposal makes them **per-tenant**, and introduces a namespace-scoped `TenantQueue` API object that configures queue settings for a namespace. Since tenant = namespace = `FederatedResourceQuota` scope, no separate namespace selector is needed — one `TenantQueue` per namespace governs the queue behavior for all `ResourceBinding` objects in that namespace.

---

## Motivation

Comment on lines +62 to +65
```go
// +genclient
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +kubebuilder:resource:path=tenantqueues,scope=Namespaced,shortName=tq,categories={karmada-io}
Comment on lines +164 to +167

HOL blocking is tracked via a `blocked bool` flag on the tenant entry. The flag is cleared by an `onActiveQPush` callback on the inner queue, which fires whenever a binding is moved back to `activeQ` (backoff expiry, unschedulable flush, cluster state change).

---
Comment on lines +175 to +182
| Throughput | Higher | Lower (head-of-line blocking) |
| Ordering guarantee | Best effort | Deterministic within tenant |
| Typical use case | Interactive / heterogeneous batch | Sequential pipelines, strict ordering |

---

## Design Notes

- Add doc comments to QueueingStrategy type and constants
- Clarify queue isolation is opt-in (namespaces without TenantQueue share default)
- Enforce singleton name 'queue' instead of validation webhook
- Fix "creation timestamp" to "enqueue timestamp" for ordering semantics
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.93%. Comparing base (774db9b) to head (a87ce01).
⚠️ Report is 28 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7485      +/-   ##
==========================================
- Coverage   42.16%   41.93%   -0.24%     
==========================================
  Files         876      879       +3     
  Lines       64968    54328   -10640     
==========================================
- Hits        27395    22780    -4615     
+ Misses      35874    29826    -6048     
- Partials     1699     1722      +23     
Flag Coverage Δ
unittests 41.93% <ø> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@seanlaii
Copy link
Copy Markdown
Contributor

/assign

@mszacillo
Copy link
Copy Markdown
Member

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants