Skip to content

Conversation

@mszacillo
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
Fixes bug which prevents cluster failover for multiple component workloads when MultiplePodTemplatesScheduling is enabled.

Which issue(s) this PR fixes:
Fixes #7065

Does this PR introduce a user-facing change?:

Scheduler should detect change in replicas for multiple component workloads

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 31, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @mszacillo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug that prevented the scheduler from accurately recognizing changes in replica counts for workloads that consist of multiple components. By introducing refined replica calculation methods and integrating them into the existing change detection logic, the system can now correctly identify when replica adjustments are needed, thereby ensuring robust cluster failover and consistent resource management for complex applications.

Highlights

  • Bug Fix for Multi-Component Workloads: This pull request resolves a bug where the scheduler failed to detect changes in replicas for workloads composed of multiple components, especially when MultiplePodTemplatesScheduling was enabled. This ensures correct cluster failover behavior.
  • Enhanced Replica Calculation Logic: New helper functions, GetTotalBindingReplicas and GetSumOfReplicasForComponents, have been introduced to accurately calculate the total desired replicas by summing up replicas across individual components when present. This improves the precision of replica change detection.
  • Updated Replica Change Detection: The IsBindingReplicasChanged function has been updated to leverage the new replica calculation logic. It now correctly compares the assigned replicas in target clusters against the total desired replicas, whether for single or multi-component workloads, for both Duplicated and Divided scheduling types.
  • Comprehensive Test Coverage: New test cases have been added to binding_test.go to validate the updated IsBindingReplicasChanged function's behavior with multi-component workloads under both Duplicated and Divided replica scheduling strategies, ensuring the fix works as expected.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 31, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug where the scheduler failed to detect replica changes for multi-component workloads. The fix introduces GetTotalBindingReplicas to correctly calculate total replicas by summing component replicas when available, and updates IsBindingReplicasChanged to use this new logic. The changes are logical and well-tested with new test cases for multi-component scenarios. My feedback is minor and focuses on improving the documentation of the new exported functions to better align with the project's style guide.

return false
}

// GetTotalBindingReplicas will get the total replicas for a given resourcebinding

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for this exported function could be more descriptive to better align with the repository style guide (lines 7 and 21), which encourages clear descriptions of purpose and behavior over simple restatements of the function name.

Suggested change
// GetTotalBindingReplicas will get the total replicas for a given resourcebinding
// GetTotalBindingReplicas returns the total replicas for a resource binding, which is the sum of component replicas or the value of spec.Replicas.
References
  1. All exported functions, methods, structs, and interfaces must be documented with clear and concise comments describing their purpose and behavior. The example provided shows that a good comment explains what the function does, including its behavior in different cases (e.g., what it returns if a key doesn't exist). (link)

return bindingSpec.Replicas
}

// GetSumofReplicasForComponents will get the sum of replicas for multi-component resources

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for this exported function could be more descriptive as per the repository style guide (lines 7 and 21). Additionally, there is a typo in the function name within the comment (GetSumof... should be GetSumOf...).

Suggested change
// GetSumofReplicasForComponents will get the sum of replicas for multi-component resources
// GetSumOfReplicasForComponents calculates the sum of replicas for a slice of components.
References
  1. All exported functions, methods, structs, and interfaces must be documented with clear and concise comments describing their purpose and behavior. The current comment is tautological and could be improved to be more descriptive. (link)

@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.57%. Comparing base (2dac564) to head (a40a850).
⚠️ Report is 7 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7066      +/-   ##
==========================================
+ Coverage   46.55%   46.57%   +0.02%     
==========================================
  Files         700      700              
  Lines       48084    48099      +15     
==========================================
+ Hits        22384    22401      +17     
  Misses      24016    24016              
+ Partials     1684     1682       -2     
Flag Coverage Δ
unittests 46.57% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign
Will take a look at it next week once I get back from New Year's Day.

@zhzhuang-zju
Copy link
Contributor

/assign

I recall encountering a similar issue in scaling scenarios before. I’ll look into it further to see if it’s a common pattern.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a simple issue, but it might involve how to record the scheduling results for multi-template workloads and how to detect changes and trigger re-scheduling.

I need a bit more time to look into it.
Thanks @mszacillo for bringing this up, this is indeed a bug.

Comment on lines +68 to +74
func GetSumOfReplicasForComponents(components []workv1alpha2.Component) int32 {
replicasSum := int32(0)
for _, component := range components {
replicasSum += component.Replicas
}
return replicasSum
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A corner case would break down this:
Let's say a FlinkDeployment with two components:

  • jobManager, replicas 1
  • taskManager, replicas 3
    If we swap the replicas between the two components, the total replicas doesn't change.

Copy link
Contributor Author

@mszacillo mszacillo Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Perhaps we can consider extending the ClusterInfo that is currently in the ResourceBinding?

// TargetCluster represents the identifier of a member cluster.
type TargetCluster struct {
	// Name of target cluster.
	Name string `json:"name"`
	// Replicas in target cluster
	// +optional
	Replicas int32 `json:"replicas,omitempty"`
}

This could be extended to include components.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly what I'm thinking too. In addition, I'm also thinking of deprecating .spec.replicaRequirements and .spec.replicas and replacing them with the .spec.components, as part of the multi-components workload scheduling feature(#6998).

Given that they are widely used across the whole codebase, it's a little bit challenging to do that in a compatible and smooth way.

@mszacillo mszacillo force-pushed the component-scheduler-fix branch from 63d042c to a40a850 Compare January 5, 2026 01:25
@karmada-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhzhuang-zju
Copy link
Contributor

The previous behavior of multi-components scheduling was: among multiple candidate clusters, select the one with relatively sufficient resources for scheduling. Due to the absence of a rescheduling (re-entrant scheduling) path, the initial scheduling decision was never revised once made.

This fix aims to add a re-entrant scheduling path for multi-components scheduling, but the following points need further confirmation:

  1. It’s clear that failover should trigger the scheduling process—but what about scaling operations (scale-up or scale-down)? For example, if cluster member1 was initially selected, but the workload later scales up such that member1 can no longer satisfy the resource requirements, what should the expected behavior be?

  2. If scaling operations also trigger the scheduling process, I’ve identified an issue: when calculating MaxAvailableComponentSets each time, the previous scheduling result isn’t taken into account. For instance, during the first calculation, cluster member1 might support 1 component set, but during the second calculation, its available capacity drops to 0—because resources allocated by the prior scheduling decision are already occupying member1’s capacity.

WDYT? @mszacillo @RainbowMango

@mszacillo
Copy link
Contributor Author

It’s clear that failover should trigger the scheduling process—but what about scaling operations (scale-up or scale-down)? For example, if cluster member1 was initially selected, but the workload later scales up such that member1 can no longer satisfy the resource requirements, what should the expected behavior be?

At least in our use-case, if someone is scaling up their workload, then Karmada should keep the application in the cluster that it is currently scheduled to. This is the current behavior when you use an aggregated replicaDivisionPreference. If for instance I scale up the parallelism of a FlinkDeployment currently comprising of {1 JM, 5 TMs} -> {1 JM, 6 TMs}, only the delta of 1 replica will be taken into account as part of dynamicScaleUp.

Ideally we should keep the behavior similar in the component case.

If scaling operations also trigger the scheduling process, I’ve identified an issue: when calculating MaxAvailableComponentSets each time, the previous scheduling result isn’t taken into account. For instance, during the first calculation, cluster member1 might support 1 component set, but during the second calculation, its available capacity drops to 0—because resources allocated by the prior scheduling decision are already occupying member1’s capacity.

Yeah this is the more complicated issue. :/

Thinking out loud, I wonder if we can update what the component estimator computes depending on mode:

If Fresh (or if it is Steady but being scheduled for the first time): Calculate maxSets
If Steady: Calculate feasibility of the incremental change:

  • Determine per-component delta (desired - currently scheduled to cluster)
  • Evaluate quota against just that delta resource vector

For currently scheduled replicas, we'd need to reference the ClusterInfo as mentioned above. That said this would require a good amount of changes to how we call MaxAvailableComponentSets, since we just package Components: spec.Components, directly into the ComponentSetEstimationRequest.

@zhzhuang-zju
Copy link
Contributor

if someone is scaling up their workload, then Karmada should keep the application in the cluster that it is currently scheduled to.

Yeah, I agree with this strategy, as it best ensures business continuity.

However, if scaling operations (scale-up or scale-down) also retrigger the scheduling process, the current implementation cannot guarantee this behavior. I think this is also worth noting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster failover does not work as expected with MultiplePodTemplateScheduling

5 participants