Skip to content

Conversation

@zhzhuang-zju
Copy link
Contributor

What type of PR is this?
/kind documentation

What this PR does / why we need it:
Karmada currently supports declaring a set of candidate clusters through clusterAffinity, or multiple sets of candidate clusters through ClusterAffinities (which combines multiple clusterAffinity terms in a specific order). However, in either approach, each clusterAffinity represents an independent, mutually exclusive cluster set during a single scheduling process—the scheduler ultimately selects only one cluster group defined by one clusterAffinity or its subset.

This model has limitations in hybrid cloud scenarios (such as coexistence of local data centers and public clouds). In practical use, local clusters typically serve as the preferred resource pool, while public cloud clusters act as extensions or backup resources. The two are not completely independent and mutually exclusive relationships, but should be automatically used supplementarily based on priority when local resources are insufficient.

To address this, this proposal introduces cascading cluster affinity scheduling to describe priority relationships between cluster groups. This mechanism will enable Karmada to better support workload scheduling in hybrid cloud environments and improve the deployment practicality of online applications in terms of elasticity.

Which issue(s) this PR fixes:

Parts of #7014

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@karmada-bot karmada-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Jan 5, 2026
@karmada-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rainbowmango for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link

Summary of Changes

Hello @zhzhuang-zju, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive proposal for "cascading cluster affinity scheduling" in Karmada. The core objective is to enhance workload placement strategies in hybrid cloud environments by allowing users to define prioritized cluster groups. This mechanism ensures that workloads first attempt to utilize preferred, often more cost-effective, clusters and then automatically expand to supplementary clusters when primary resources are insufficient, thereby optimizing resource utilization and supporting elastic scaling. The proposal details API changes, necessary scheduler adjustments, and a test plan to integrate this new functionality.

Highlights

  • Introduction of Cascading Scheduling: Introduces a new "cascading cluster affinity scheduling" mechanism to Karmada, allowing for priority-based resource allocation across cluster groups.
  • Enhanced Hybrid Cloud Support: Addresses limitations in hybrid cloud scenarios by enabling automatic supplementary use of cluster resources, where primary clusters are preferred and secondary clusters act as extensions or backups.
  • Priority-Based Resource Allocation: Workloads will preferentially use primary clusters (e.g., local data centers) and automatically cascade to supplementary clusters (e.g., public clouds) when primary resources are insufficient.
  • API Extension Proposals: Presents three distinct API approaches to implement this feature within Karmada's PropagationPolicy and Placement APIs, ensuring flexibility in design.
  • Scheduler Logic Adjustment: Requires modifications to the karmada-scheduler to process multiple affinity terms in a cascading manner, adapting its current single-affinity-term logic.
  • Backward Compatibility: The proposal emphasizes backward compatibility, introducing entirely new APIs without altering the behavior of existing configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 5, 2026
@zhzhuang-zju
Copy link
Contributor Author

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal for 'Cascading Cluster Affinity Scheduling' in Karmada. The proposal is well-structured and clearly outlines the motivation, goals, and different API design approaches to enable prioritized cluster selection, which is particularly useful for hybrid cloud environments.

My review focuses on the design choices presented in the proposal. I've provided feedback on the three API design approaches, recommending one for its clarity and maintainability. I also pointed out a need for clarification on how the Duplicated replica scheduling strategy would behave in a failover scenario to ensure the design is comprehensive.

Comment on lines +49 to +173
### API change

#### Approach 1:

Extend the `ClusterAffinity` API by adding a `Supplements` field to describe supplementary cluster group configurations. This field allows users to define one or more alternative cluster groups for a single Affinity Group. When the primary cluster group has insufficient resources or is unavailable, the scheduler can automatically cascade to these supplementary cluster groups for workload deployment. Note: Supplements is an array that can set multiple tiers of extensible cluster groups, with scheduling priority decreasing as the tier level increases.

```go
// ClusterAffinity represents the filter to select clusters.
type ClusterAffinity struct {
// Omitted, as there are no changes.

// new added API field
Supplements []ClusterAffinity `json:"supplements,omitempty"`
}
```

The following configuration declares a ClusterAffinity with an extended cluster group:

```yaml
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: nginx
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinity:
clusterNames:
- cluster1
supplements:
- clusterNames:
- cluster2
- cluster3
```

#### Approach 2:

Currently, ClusterAffinities have mutually exclusive relationships between ClusterAffinity terms, but they can also have cascading supplementary relationships. Add `AffinityStrategy.Mode` to the placement API to describe the relationship between ClusterAffinities.

```go
type Placement struct {
ClusterAffinities []ClusterAffinityTerm `json:"clusterAffinities,omitempty"`

// AffinityStrategy defines how cluster affinities are evaluated
// +optional
AffinityStrategy *AffinityStrategy `json:"affinityStrategy,omitempty"`
}

type AffinityStrategy struct {
// Mode defines the scheduling mode
// +kubebuilder:validation:Enum=Exclusive;Cascade
// +kubebuilder:default=Exclusive
// +optional
Mode string `json:"mode,omitempty"`
}
```

The following configuration declares a cascading expansion relationship between multiple clusterAffinity terms:

```yaml
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: nginx
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinities:
- affinityName: primary
clusterNames:
- cluster1
- affinityName: backup
clusterNames:
- cluster2
- cluster3
affinityStrategy:
mode: Cascade
```

#### Approach 3:

Add a new API `PreferredClusterAffinities` at the same level as ClusterAffinities to declare cluster groups with priority relationships.

```go
// Placement represents the rule for select clusters.
type Placement struct {
// PreferredClusterAffinities represents scheduling preferences to multiple cluster
// groups that indicated by ClusterAffinityTerm with priority-based selection.
//
// Unlike ClusterAffinities which are mutually exclusive (scheduler selects only one group),
// PreferredClusterAffinities allows the scheduler to use multiple cluster groups based on
// priority and resource availability.
// +optional
PreferredClusterAffinities []ClusterAffinityTerm `json:"preferredClusterAffinities,omitempty"`
}
```

The following configuration declares a preferredClusterAffinities with two affinity terms:

```yaml
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: nginx
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
preferredClusterAffinities:
- affinityName: primary
clusterNames:
- cluster1
- affinityName: backup
clusterNames:
- cluster2
- cluster3
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This proposal presents three distinct approaches for the API design. Among them, Approach 3 appears to be the most robust and clear for the following reasons:

  • Clarity and Explicitness: It introduces a new field preferredClusterAffinities, which makes the intent of priority-based cascading scheduling explicit. This avoids ambiguity for users.
  • Maintains Orthogonality: It keeps the existing clusterAffinity and clusterAffinities fields' semantics unchanged. Reusing or extending existing fields (as in Approach 1 and 2) could lead to confusion and backward-compatibility challenges. For instance, Approach 2 overloads clusterAffinities with a new mode, which can make policies harder to understand at a glance.
  • Simplicity: Approach 1 introduces a recursive Supplements field, which could lead to complex, deeply nested structures that are difficult to manage and reason about. Approach 3 provides a flat list of prioritized cluster groups, which is simpler.

Given these points, adopting Approach 3 would likely lead to a more maintainable and user-friendly API.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.55%. Comparing base (b175217) to head (3ce2d88).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7078   +/-   ##
=======================================
  Coverage   46.55%   46.55%           
=======================================
  Files         700      700           
  Lines       48091    48091           
=======================================
+ Hits        22389    22390    +1     
+ Misses      24020    24019    -1     
  Partials     1682     1682           
Flag Coverage Δ
unittests 46.55% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Categorizes issue or PR as related to documentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants