Asynchronous Inadmissible Workload Requeueing by gabesaba · Pull Request #9232 · kubernetes-sigs/kueue

gabesaba · 2026-02-13T18:02:28Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

We decouple requesting inadmissible workloads to be reprocessed,
from the processing of these requests where we move the workloads.

This has a few main advantages

Allows batching requeues, avoiding the starvation noted here
Reduces lock contention
Avoid spamming requeues 20+ times per second.

We accomplish this by defining the inadmissibleWorkloadRequeuer,
which implements the requeueInadmissibleListener interface.

Any requests to requeue inadmisisble workloads must go through this interface.
Then, the requeuer will process these requests in batches. The requeuer
deduplicates requests to the same ClusterQueue/Root Cohort, further reducing
duplicate reprocessing.

Which issue(s) this PR fixes:

Fixes #8095

Special notes for your reviewer:

one preparatory PR merges this diff will be smaller [Refactor] queue.Manager Factory for integ tests #9224,
see note in 9224 about exporting these test factories [Refactor] queue.Manager Factory for integ tests #9224
see comment about eventCh buffering - need some feedback here

Does this PR introduce a user-facing change?

Scheduling: Fix the bug where inadmissible workloads would be re-queued too frequently at scale.
This resulted in excessive processing, lock contention, and starvation of workloads deeper in the queue.
The fix is to throttle the process with a batch period of 1s per CQ or Cohort.

k8s-ci-robot · 2026-02-13T18:02:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabesaba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [gabesaba]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gabesaba · 2026-02-13T18:02:52Z

@sohankunkerkar, can you drive this review please?

netlify · 2026-02-13T18:03:05Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`50c86d6`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/699329c78099d600087ad65a

mwielgus

How often are the inadmissible workloads requeued after this change?

sohankunkerkar

Thanks for working on this @gabesaba. The approach of decoupling notification from processing via TypedController[requeueRequest] with AddAfter directly targets the QueueInadmissibleWorkloads churn identified in #8095.
As @mwielgus mentioned, it would really help if you could provide some numbers showing the performance improvement.

Note: On testing the batching behavior: Integration tests use 1ms batch period, which effectively makes them synchronous and doesn't exercise the actual batching/deduplication.

So client-go's delaying_queue implementation: deduplication only happens while the item is in the waiting heap. With a 1ms window, items mature to the active queue almost instantly, meaning a stream of events spaced >1ms apart will result in multiple reconciles instead of one. A unit test with a controlled clock or a longer batch period is necessary to prove this scenario.

sohankunkerkar · 2026-02-14T04:12:57Z

pkg/cache/queue/inadmissible_workloads.go

+func (r *inadmissibleWorkloadRequeuer) setupWithManager(mgr ctrl.Manager) error {
+	return builder.TypedControllerManagedBy[requeueRequest](mgr).
+		Named("inadmissible_workload_requeue_controller").
+		WatchesRawSource(source.TypedChannel(r.eventCh, &inadmissibleWorkloadRequeuer{})).


inadmissibleWorkloadRequeuer{} is a zero-value instance and its batchPeriod is 0. The Generic handler calls q.AddAfter(e.Object, r.batchPeriod), and client-go's AddAfter with duration <= 0 falls through to q.Add(item) immediately with no delay.
This means the 1s batch period defined here is never applied. Should this be r instead?

Good catch, should definitely be r. I'll add an integration test to, to make sure we have this code under test.

sohankunkerkar · 2026-02-14T04:19:11Z

pkg/cache/queue/inadmissible_workloads.go

+		// q.AddAfter will process this so fast that it is not necessary.
+		// LLM review suggested this to derisk deadlock (during startup?), but I don't
+		// see this risk.
+		eventCh:     make(chan event.TypedGenericEvent[requeueRequest]),


I looked at controller-runtime's source.TypedChannel internals where syncLoop starts a goroutine that continuously reads from the user channel and writes to an internal 1024-buffered dst channel. Since all callers (NotifyRetryInadmissible, notifyRetryInadmissibleWithoutLock, AddOrUpdateCohort) are reconciler goroutines that only run after mgr.Start() has started the source.

Even though unbuffered channel looks safe with current TypedChannel startup ordering. Still, for future DRA/WAS/TAS growth, should we prefer a small bounded buffer here as the default resilience pattern for internal event fan-in? We still keep dedup semantics in AddAfter with keyed requeueRequest.

Thanks for the analysis, and this sgtm. How does 128 sound to you?

Yeah, that sounds good!

sohankunkerkar · 2026-02-14T04:24:45Z

pkg/cache/queue/inadmissible_workloads.go

+}
+
+// inadmissibleWorkloadRequeuer is responsible for receiving notifications,
+// and requeuering workloads as a result of these notifications.


s/requeuering/requeuing/g

sohankunkerkar · 2026-02-14T04:27:08Z

pkg/cache/queue/manager.go

+	reportMetrics(m, cqImpl.name)

-	if queued || addedWorkloads {
+	if addedWorkloads {


I assume this is the intentional tradeoff for batching?
This is fine if intentional, but worth explicitly documenting as throughput-over-latency tradeoff.

Related to #9232 (comment). Broadcast wlil be a no-op until the workloads are moved. I add a broadcast call in requeueWorkloadsCQ and requeueWorkloadsCohort

sohankunkerkar · 2026-02-14T05:11:13Z

pkg/cache/queue/inadmissible_workloads.go

-// or otherwise risk encountering an infinite loop if a Cohort
-// cycle is introduced.
-func requeueWorkloadsCohort(ctx context.Context, m *Manager, cohort *cohort) bool {
+// RequeueCohort moves all inadmissibleWorkloads in


It looks like we can remove this information.

Updated the comment, combining information from both parts

sohankunkerkar · 2026-02-14T05:12:38Z

pkg/cache/queue/inadmissible_workloads.go

+	batchPeriod time.Duration
+}
+
+func newInadmissibleWorkloadReconciler(qManager *Manager) *inadmissibleWorkloadRequeuer {


The constructor says "Reconciler", type says "Requeuer". Pick one. Since the type is inadmissibleWorkloadRequeuer, the constructor should be newInadmissibleWorkloadRequeuer.

mimowo · 2026-02-16T11:37:03Z

pkg/cache/queue/test_util.go

+
+// NewManagerForIntegrationTests is a factory for Integration Tests, setting the
+// batch period to a much lower value (requeueBatchPeriodIntegrationTests).
+func NewManagerForIntegrationTests(client client.Client, checker StatusChecker, options ...Option) *Manager {


Do we really need the functions? I would argue against introducing such functions for testing.

It is a cleaner pattern to parametrize the value by passing the Options. Here you could expose a param like BatchInterval.

For example this is how controller-runtime exposes configuration for manager's RetryPeriod and other options.

How about the options for NewManager, and then some factory that resides in the integration tests, that sets up integration tests with certain options (such as a smaller batching period)?

+1, yes generic NewManager in the production code, and NewManagerForIntegrationTests for the test packages sgtm

mimowo · 2026-02-16T11:37:55Z

pkg/cache/queue/test_util.go

+}
+
+// NewManagerForUnitTestsWithRequeuer creates a new Manager for testing purposes, pre-configured with a testInadmissibleWorkloadRequeuer.
+func NewManagerForUnitTestsWithRequeuer(client client.Client, checker StatusChecker, options ...Option) (*Manager, *testInadmissibleWorkloadRequeuer) {


For unit tests it is slightly more acceptable, but also I would argue against it. If we need to have a dedicated parametrization in unit tests we can use Options to the standard NewManager.

For example, I'm wondering if we could pass clock when constructing the manager (might be optional parameter, real clock by default), and just move the clock by one second in unit tests. This aligns more with the current approach to time-based testing.

For the new controller to work, it needs to be wired with ctrl.Manager, and these requeues will be processed in a different go-routine. I don't think this is the right choice for unit tests.

For the new controller to work, it needs to be wired with ctrl.Manager, and these requeues will be processed in a different go-routine. I don't think this is the right choice for unit tests.

Hm, let me double check why you consider this is not the right choice? Actually, for unit tests it is great to be able to control the passage of time. Otherwise we risk flakes, this is why we tend to use fake clocks for most of our unit testing.

I remember this is what we do in the core k8s when using workqueues, just pass time and this triggers a goroutine inside the workqueue mechanics triggers the reconcile. So, I think what is needed is to call NewControllerManagedBy and customize the clock.

Let me know if there are some issues why this is not valid approach. It is not a blocker for me, just feels like more natural, safer (to avoid flakes), and consistent with the rest of codebase.

This still requires setting up a controller-runtime ctrl.Manager in the unit tests, to call NewControllerManagedBy (or TypedControllerManagedBy). This seems to me deep in the territory of integration tests, especially since existing unit tests don't have a ctrl.Manager. Even if this were not required, we're relying on processing in go-routines and will require some syncronization to ensure the tests are not flaky.

I think that this lightweight test object makes more sense for unit tests, and covering the behavior of the async controller should be handled in integration tests (to be expanded as per #9232 (comment))

I can see this is a matter of taste. Indeed, spawning a goroutine is what we do in unit tests for in the k8s contorllers, for example here. However, there it is a bit more lightweight as the basic workqueue is used, rather than the entire controller-runtime's manager.

I think that this lightweight test object makes more sense for unit tests, and covering the behavior of the async controller should be handled in integration tests (to be expanded as per #9232 (comment))

Yes, but the drawback is leaking test-only functions into prod code (at least in the initial implementation), maybe this is solvable by other means.

In any case, I'm ok with the pattern as is, it is not a blocker, just something I wanted to explore as we go into the territory of using controller-runtime reconcilers for internal code.

I think taking a pause of how to do it well is time worth spent. I'm still a bit hesitant - even if you wire this all up now, there will be a learning curve for how to wire the stuff in the future. I think it might be introducing a complexity barrier for new developers in the community. I basically though that reusing the controller-runtime existing machinery will make the unit tests easier to write, but I'm not stubborn here.

Let me know if you indeed considered the consequences and think this is better decision I'm ok with that.

mimowo · 2026-02-16T11:39:23Z

pkg/cache/queue/manager_test.go

 		utiltestingapi.MakeWorkload("a", "moon").Queue("foo").Obj(),
 	)
-	manager := NewManager(kClient, nil)
+	manager := NewManagerForUnitTests(kClient, nil)


I don't like how the diff is bloated by this change. Rather than impacting all tests, can we introduce the new batch period behind a dedicated feature gate, and only add a set of dedicated tests?

I created a prep PR, to reduce the diff: #9224

In the feature flag case, are you suggesting maintaining both branches - before and after decoupling request requeueing from processing requeueing?

In the feature flag case, are you suggesting maintaining both branches - before and after decoupling request requeueing from processing requeueing?

Yeah, good question. I think the amount of changes looks scary for the PR to be safe without a FG. If we can bring it down to reviewable state by prep PRs, maybe we can drop the FG.

If we go with the FG approach I imagine it goes to Beta directly (enabled by default), and we only have a small portion of tests testing the FG disabled. Then we graduate the FG after 2 releases. Basically, we keep FG=false just as a bailout output when something goes wrong.

mimowo · 2026-02-16T11:55:31Z

pkg/cache/queue/inadmissible_workloads.go

-				continue
-			}
-			processedRoots.Insert(rootName)
-		}


Is it a drive-by cleanup, or something necessary for the PR?

Drive-by-cleanup. I (think) I can revert this, but later on this simplification makes sense, as we collapse keys in the requeuer and adds are cheap

Let's decouple to a PR.

Ack. I will submit as a follow-up PR

mimowo · 2026-02-16T11:58:08Z

pkg/cache/queue/inadmissible_workloads.go

 	for _, clusterQueue := range cohort.ChildCQs() {
-		queued = queueInadmissibleWorkloads(ctx, clusterQueue, m.client) || queued
+		if queueInadmissibleWorkloads(ctx, clusterQueue, m.client) {
+			reportMetrics(m, clusterQueue.name)


It doesn't seem if we called reportMetrics here before, and so the change looks unrelated. Was it a bug before, or this is some non-obvious drive-by cleanup or something really needed here now?

Previously we were calling requeue, and then reporting metrics if anything was moved. Now that we decouple notifying requeue from the actual requeueing, we need to update metrics in the latter step.

sgtm, don't you mind just introducing reportMetrics in a prep?

Renamed as reportPendingWorkloads in that PR

Can be resolved now.

pkg/cache/queue/manager.go

mimowo · 2026-02-16T12:06:11Z

Let me propose to rephrase with the clear prefix, and clearly clarification what is the batch period, feel free to adjust, but I wanted to emphasize the direction in communicating to end user.
/release-note-edit

Scheduling: Fix the bug where inadmissible workloads would be re-queued too frequently at scale.
This resulted in excessive processing, lock contention, and starvation of workloads deeper in the queue.
The fix is to throttle the process with a batch period of 1s per CQ or Cohort.

mimowo · 2026-02-16T12:14:20Z

LGTM overall, the main comments from me:

it would be better to see if we can eliminate the custom functions for Unit and Integration tests. I imagine in unit tests we could use a custom clock to trigger the Reconcile.
I'm wondering about a feature gate, but for 1s batching I'm leaning to think this is not needed. For larger delays like 5s. or 30s I would find this necessary, but we can consider it in the future.
The PR is already large and we are going to cherrypick, so offloaing the diff by prep PRs, like drive-by cleanups is preferred

k8s-ci-robot · 2026-02-16T14:32:04Z

@gabesaba: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kueue-test-integration-baseline-main	`50c86d6`	link	true	`/test pull-kueue-test-integration-baseline-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot · 2026-02-16T19:09:48Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mimowo · 2026-02-17T07:24:49Z

pkg/controller/core/resourceflavor_controller.go

 	// we should inform clusterQueue controller to broadcast the event.
 	if cqNames := r.cache.AddOrUpdateResourceFlavor(log, e.Object.DeepCopy()); len(cqNames) > 0 {
-		qcache.QueueInadmissibleWorkloads(context.Background(), r.qManager, cqNames)
+		qcache.NotifyRetryInadmissible(r.qManager, cqNames)


Nice to see these uses of Background context eliminated 👍

gabesaba added 2 commits February 13, 2026 16:52

[Refactor] queue.Manager Factory for testing

670d147

Asynchronous Inadmissible Workload Requeueing

90d5b2d

gabesaba requested a review from sohankunkerkar February 13, 2026 18:02

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 13, 2026

k8s-ci-robot requested review from PBundyra and mimowo February 13, 2026 18:02

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 13, 2026

gabesaba mentioned this pull request Feb 13, 2026

[Refactor] queue.Manager Factory for integ tests #9224

Merged

mwielgus reviewed Feb 13, 2026

View reviewed changes

sohankunkerkar reviewed Feb 14, 2026

View reviewed changes

mimowo reviewed Feb 16, 2026

View reviewed changes

pkg/cache/queue/manager.go Show resolved Hide resolved

address comments

50c86d6

gabesaba mentioned this pull request Feb 16, 2026

[Refactor] Move all metrics ops to metrics.go in cache.queue #9295

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2026

mimowo reviewed Feb 17, 2026

View reviewed changes

gabesaba mentioned this pull request Feb 17, 2026

[Refactor] queue.Manager Factory for unit tests #9321

Merged

Conversation

gabesaba commented Feb 13, 2026 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Feb 13, 2026

Uh oh!

gabesaba commented Feb 13, 2026

Uh oh!

netlify bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

mwielgus left a comment

Choose a reason for hiding this comment

Uh oh!

sohankunkerkar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimowo Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimowo Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

gabesaba commented Feb 13, 2026 •

edited by k8s-ci-robot

Loading

netlify bot commented Feb 13, 2026 •

edited

Loading

mimowo Feb 16, 2026 •

edited

Loading

mimowo Feb 17, 2026 •

edited

Loading

mimowo commented Feb 16, 2026 •

edited

Loading

mimowo commented Feb 16, 2026 •

edited

Loading