Skip to content

feat(controller-manager): make taint-manager binding-eviction concurrency configurable#7495

Open
RyanAtNetflix wants to merge 1 commit into
karmada-io:masterfrom
RyanAtNetflix:eviction-storm/03-concurrent-binding-eviction-syncs-flag
Open

feat(controller-manager): make taint-manager binding-eviction concurrency configurable#7495
RyanAtNetflix wants to merge 1 commit into
karmada-io:masterfrom
RyanAtNetflix:eviction-storm/03-concurrent-binding-eviction-syncs-flag

Conversation

@RyanAtNetflix
Copy link
Copy Markdown

What type of PR is this?

/kind feature

What this PR does / why we need it:

The NoExecuteTaintManager runs two AsyncWorkers — bindingEvictionWorker and clusterBindingEvictionWorker — that handle the eviction queues for ResourceBinding and ClusterResourceBinding respectively. Both worker pools are sized by NoExecuteTaintManager.ConcurrentReconciles, which karmada-controller-manager has been hardcoding to 3 since the taint manager landed:

ConcurrentReconciles: 3,

There is no flag to override this. In a cluster-failover storm, where the taint manager is the producer that appends GracefulEvictionTasks for every binding on a failed cluster, that hardcoded ceiling caps end-to-end eviction throughput regardless of every other tuning knob — --resource-eviction-rate, --rate-limiter-qps, --kube-api-qps. A 5K-binding lab measured a sustained eviction rate of roughly 1.2 RB/s at the default; lifting the ceiling to 50 in the same lab raised sustained throughput to ~82 RB/s (peak 101).

This change introduces --concurrent-binding-eviction-syncs (matching the existing --concurrent-cluster-syncs / --concurrent-work-syncs naming convention) with a default of 5 — the same default the rest of the Concurrent* family uses. Operators with large clusters and fast control-plane apiservers can raise it to lift the eviction-rate ceiling; operators with constrained apiservers can lower it. The default is a modest bump from 3 to 5 rather than the 50 that maxed out throughput in the lab, because the right sustained value depends on apiserver sizing — picking 50 unilaterally would shift load that operators might not be ready for. The flag's help text calls this out and points operators at the failover-storm trade-off.

Wiring:

  • cmd/controller-manager/app/options.Options gains ConcurrentBindingEvictionSyncs and the matching pflag.
  • Validation rejects values <= 0.
  • pkg/controllers/context.Options carries it through to the controller registry.
  • cmd/controller-manager/app/controllermanager.go's startClusterController reads ctx.Opts.ConcurrentBindingEvictionSyncs in place of the hardcoded 3.
  • docs/command-line-flags/karmada-controller-manager.md is regenerated via hack/update-command-line-flags.sh.

Which issue(s) this PR fixes:

Fixes #7483

Special notes for your reviewer:

This is the third of a 3-PR series addressing eviction throughput during a cluster-failover storm. The first two PRs (#TODO link to PR1, #TODO link to PR2) eliminate an optimistic-lock conflict storm in the read-modify-write paths between the taint manager and the graceful-eviction controllers. With the conflict storm gone, the hardcoded worker-pool ceiling becomes the next bottleneck — this PR removes that ceiling without changing the default behavior in any meaningful way.

Default value is the main thing I'd like input on. I picked 5 to match the rest of the Concurrent* family and to keep the default-config behavior close to the historical 3. Reviewers may have a stronger view based on apiserver-sizing assumptions for typical Karmada deployments. The 5K-binding lab maxed out at 50, but I am not comfortable making that the upstream default without broader validation across cluster shapes.

The flag name --concurrent-binding-eviction-syncs covers both ResourceBinding and ClusterResourceBinding eviction queues, since they share NoExecuteTaintManager.ConcurrentReconciles today. If reviewers prefer separate flags per queue, I can split it.

Tests:

  • TestValidateControllerManagerConfiguration extended to cover the new validation rule (rejecting <= 0).
  • go test ./cmd/controller-manager/... ./pkg/controllers/context/... is clean.
  • go build ./... is clean.
  • hack/update-command-line-flags.sh ran cleanly; the regenerated karmada-controller-manager.md is included in this commit so the verify-command-line-flags.sh check stays green.

Local benchmark on a 5K-binding kind lab failing one cluster, with all three PRs applied and the flag set to 50: sustained ~72 RB/s, peak 108 RB/s.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Introduced `--concurrent-binding-eviction-syncs` (default 5) to control the size of the taint-manager's binding-eviction worker pool. Operators running large clusters can raise this to lift the eviction-throughput ceiling during a cluster-failover storm; operators with constrained apiservers can lower it. The previous behavior was a hardcoded ceiling of 3.

…ency configurable

The NoExecuteTaintManager runs two AsyncWorkers — bindingEvictionWorker
and clusterBindingEvictionWorker — that handle the eviction queues for
ResourceBinding and ClusterResourceBinding respectively. Both worker
pools are sized by NoExecuteTaintManager.ConcurrentReconciles, which
karmada-controller-manager has been hardcoding to 3 since the taint
manager landed:

    ConcurrentReconciles: 3,

There is no flag to override this. In a cluster-failover storm, where
the taint manager is the producer that appends GracefulEvictionTasks
for every binding on a failed cluster, that hardcoded ceiling caps
end-to-end eviction throughput regardless of every other tuning knob —
--resource-eviction-rate, --rate-limiter-qps, --kube-api-qps. A
5K-binding lab measured a sustained eviction rate of roughly 1.2 RB/s
at the default; lifting the ceiling to 50 in the same lab raised
sustained throughput to ~82 RB/s (peak 101).

This change introduces --concurrent-binding-eviction-syncs (matching
the existing --concurrent-cluster-syncs / --concurrent-work-syncs
naming convention) with a default of 5 — the same default the rest of
the Concurrent* family uses. Operators with large clusters and fast
control-plane apiservers can raise it to lift the eviction-rate ceiling;
operators with constrained apiservers can lower it. The default is a
modest bump from 3 to 5 rather than the 50 that maxed out throughput
in the lab, because the right sustained value depends on apiserver
sizing — picking 50 unilaterally would shift load that operators
might not be ready for. The flag's help text calls this out and points
operators at the failover-storm trade-off.

Wiring:
  * cmd/controller-manager/app/options.Options gains
    ConcurrentBindingEvictionSyncs and the matching pflag.
  * Validation rejects values <= 0.
  * pkg/controllers/context.Options carries it through to the
    controller registry.
  * cmd/controller-manager/app/controllermanager.go's startClusterController
    reads ctx.Opts.ConcurrentBindingEvictionSyncs in place of the
    hardcoded 3.

Signed-off-by: Ryan Copley <ryan@ryancopley.com>
Copilot AI review requested due to automatic review settings May 8, 2026 16:14
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 8, 2026
@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign seanlaii for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 8, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a performance bottleneck in the Karmada controller-manager by making the taint-manager's binding-eviction concurrency configurable. By replacing a hardcoded limit with a tunable flag, operators can now adjust eviction throughput to better suit their specific cluster scale and API server capacity, particularly during high-load scenarios like cluster-failover storms.

Highlights

  • Configurable Eviction Concurrency: Introduced the --concurrent-binding-eviction-syncs flag to allow operators to tune the concurrency of the taint-manager's binding-eviction worker pool, replacing the previously hardcoded value of 3.
  • Performance Tuning: Enables better control over eviction throughput during cluster-failover events, helping to mitigate bottlenecks in large-scale environments.
  • Validation and Documentation: Added validation to ensure the new concurrency setting is positive and updated the controller-manager documentation to reflect the new flag.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable concurrency limit for the taint-manager's binding-eviction reconciles, replacing a hardcoded value with the new ConcurrentBindingEvictionSyncs option and CLI flag. Feedback from the review suggests clarifying the documentation to accurately describe the limit as a global ceiling and to specify that the concurrency value applies to each eviction queue individually.

Comment on lines +126 to +130
// ConcurrentBindingEvictionSyncs is the number of taint-manager binding-eviction
// reconciles (covering both ResourceBinding and ClusterResourceBinding queues)
// that are allowed to run concurrently. Raising this lifts the per-cluster cap
// on how quickly the taint manager can append GracefulEvictionTasks during a
// cluster-failover storm; lowering it bounds load on karmada-apiserver.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment mentions a "per-cluster cap", but the NoExecuteTaintManager workers are global to the controller-manager instance. While the bottleneck is most apparent when a single cluster fails, the concurrency limit applies to the total number of bindings being processed from all clusters. Clarifying this as a global ceiling would be more accurate.

Suggested change
// ConcurrentBindingEvictionSyncs is the number of taint-manager binding-eviction
// reconciles (covering both ResourceBinding and ClusterResourceBinding queues)
// that are allowed to run concurrently. Raising this lifts the per-cluster cap
// on how quickly the taint manager can append GracefulEvictionTasks during a
// cluster-failover storm; lowering it bounds load on karmada-apiserver.
// ConcurrentBindingEvictionSyncs is the number of taint-manager binding-eviction
// reconciles (covering both ResourceBinding and ClusterResourceBinding queues)
// that are allowed to run concurrently. Raising this lifts the ceiling
// on how quickly the taint manager can append GracefulEvictionTasks during a
// cluster-failover storm; lowering it bounds load on karmada-apiserver.

flags.IntVar(&o.ConcurrentClusterPropagationPolicySyncs, "concurrent-cluster-propagation-policy-syncs", 1, "The number of ClusterPropagationPolicy that are allowed to sync concurrently.")
flags.IntVar(&o.ConcurrentResourceTemplateSyncs, "concurrent-resource-template-syncs", 5, "The number of resource templates that are allowed to sync concurrently.")
flags.IntVar(&o.ConcurrentDependentResourceSyncs, "concurrent-dependent-resource-syncs", 2, "The number of dependent resource that are allowed to sync concurrently.")
flags.IntVar(&o.ConcurrentBindingEvictionSyncs, "concurrent-binding-eviction-syncs", 5, "The number of taint-manager binding-eviction reconciles that are allowed to run concurrently across both the ResourceBinding and ClusterResourceBinding eviction queues. Raising this lifts the cap on how quickly Karmada can append GracefulEvictionTasks during a cluster-failover storm.")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrase "across both ... queues" in the help text is slightly ambiguous. Since the implementation in NoExecuteTaintManager uses this value to size the worker pool for each queue independently (meaning the total concurrency is 2 * ConcurrentBindingEvictionSyncs), it would be clearer to use "for each of" instead of "across both".

Suggested change
flags.IntVar(&o.ConcurrentBindingEvictionSyncs, "concurrent-binding-eviction-syncs", 5, "The number of taint-manager binding-eviction reconciles that are allowed to run concurrently across both the ResourceBinding and ClusterResourceBinding eviction queues. Raising this lifts the cap on how quickly Karmada can append GracefulEvictionTasks during a cluster-failover storm.")
flags.IntVar(&o.ConcurrentBindingEvictionSyncs, "concurrent-binding-eviction-syncs", 5, "The number of taint-manager binding-eviction reconciles that are allowed to run concurrently for each of the ResourceBinding and ClusterResourceBinding eviction queues. Raising this lifts the cap on how quickly Karmada can append GracefulEvictionTasks during a cluster-failover storm.")

Comment on lines +77 to +79
// ConcurrentBindingEvictionSyncs is the number of taint-manager binding-eviction
// reconciles (covering both ResourceBinding and ClusterResourceBinding queues)
// that are allowed to run concurrently.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency with the flag's behavior, it's better to specify that the concurrency limit applies to each queue individually.

Suggested change
// ConcurrentBindingEvictionSyncs is the number of taint-manager binding-eviction
// reconciles (covering both ResourceBinding and ClusterResourceBinding queues)
// that are allowed to run concurrently.
// ConcurrentBindingEvictionSyncs is the number of taint-manager binding-eviction
// (for each of the ResourceBinding and ClusterResourceBinding queues)
// that are allowed to run concurrently.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an operator-facing concurrency knob to the controller-manager so the taint-manager’s binding-eviction worker pool size is no longer hardcoded, improving eviction throughput during cluster-failover storms while allowing operators to tune control-plane load.

Changes:

  • Introduces --concurrent-binding-eviction-syncs (default 5) and validates it must be > 0.
  • Plumbs the new option through controller context and uses it to size NoExecuteTaintManager.ConcurrentReconciles (replacing the previous hardcoded 3).
  • Regenerates karmada-controller-manager command-line flags documentation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/controller-manager/app/options/options.go Adds the new option field and CLI flag wiring for binding-eviction concurrency.
cmd/controller-manager/app/options/validation.go Validates ConcurrentBindingEvictionSyncs is greater than zero.
cmd/controller-manager/app/options/validation_test.go Extends validation tests to cover invalid concurrency values.
pkg/controllers/context/context.go Extends controller context options to carry the new concurrency setting.
cmd/controller-manager/app/controllermanager.go Uses the configured concurrency when constructing the taint-manager and passes it into controller context.
docs/command-line-flags/karmada-controller-manager.md Documents the new flag (regenerated).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +128 to +130
// that are allowed to run concurrently. Raising this lifts the per-cluster cap
// on how quickly the taint manager can append GracefulEvictionTasks during a
// cluster-failover storm; lowering it bounds load on karmada-apiserver.
flags.IntVar(&o.ConcurrentClusterPropagationPolicySyncs, "concurrent-cluster-propagation-policy-syncs", 1, "The number of ClusterPropagationPolicy that are allowed to sync concurrently.")
flags.IntVar(&o.ConcurrentResourceTemplateSyncs, "concurrent-resource-template-syncs", 5, "The number of resource templates that are allowed to sync concurrently.")
flags.IntVar(&o.ConcurrentDependentResourceSyncs, "concurrent-dependent-resource-syncs", 2, "The number of dependent resource that are allowed to sync concurrently.")
flags.IntVar(&o.ConcurrentBindingEvictionSyncs, "concurrent-binding-eviction-syncs", 5, "The number of taint-manager binding-eviction reconciles that are allowed to run concurrently across both the ResourceBinding and ClusterResourceBinding eviction queues. Raising this lifts the cap on how quickly Karmada can append GracefulEvictionTasks during a cluster-failover storm.")
--cluster-startup-grace-period duration Specifies the grace period of allowing a cluster to be unresponsive during startup before marking it unhealthy. (default 1m0s)
--cluster-status-update-frequency duration Specifies how often karmada-controller-manager posts cluster status to karmada-apiserver. (default 10s)
--cluster-success-threshold duration The duration of successes for the cluster to be considered healthy after recovery. (default 30s)
--concurrent-binding-eviction-syncs int The number of taint-manager binding-eviction reconciles that are allowed to run concurrently across both the ResourceBinding and ClusterResourceBinding eviction queues. Raising this lifts the cap on how quickly Karmada can append GracefulEvictionTasks during a cluster-failover storm. (default 5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

3 participants