Inject new task count calculation during the rollover #18860

Fly-Style · 2025-12-19T11:34:46Z

As a follow-up PR for #18819, the patch fixes temporal behaviour when scale down happens in the same manner, as scaleup, by injecting the new task count calculation logic during the task rollover time,

This PR has:

kfaraz

Thanks for the follow up, @Fly-Style !
Left some initial drive-by comments. Will do a more thorough review later today/tomorrow.

kfaraz · 2025-12-22T07:04:19Z

...ests/src/test/java/org/apache/druid/testing/embedded/indexing/TaskScaleDownRolloverTest.java

+  @SuppressWarnings("resource")
+  @Test
+  @Timeout(300)
+  void test_scaleDownDuringTaskRollover()


Can this test not be moved to the CostBasedAutoScalerIntegrationTest itself?

I decided to make it separate, because we're testing abstract autoscaler actions during the task rollover, no specifically the cost-based one. We might create lets say CPU-based autoscaler later with the requirement to scale-down during the rollover.

because we're testing abstract autoscaler actions during the task rollover, no specifically the cost-based one.

I am not sure if this is entirely true, since cost-based auto-scaler is the only one that supports task count change on rollover right now.

The new test should be in the existing CostBasedAutoScalerIntegrationTest since it uses the same cluster setup (AFAICT) and verifies an important aspect of the cost-based auto-scaler.

In the future, when we add more auto-scalers, we can add separate tests for them.

kfaraz · 2025-12-22T07:06:28Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

-
    int optimalTaskCount = -1;
-    double optimalCost = Double.POSITIVE_INFINITY;
+    Tuple3<Double, Double, Double> optimalCost = new Tuple3<>(Double.POSITIVE_INFINITY,


Better use a dedicated class than a Tuple or Pair.

kfaraz · 2025-12-22T07:08:05Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

   * @return optimal task count for scale-up, or -1 if no scaling action needed
   */
-  public int computeOptimalTaskCount(CostMetrics metrics)
+  public int computeOptimalTaskCount(CostMetrics metrics, CostComputeMode costComputeMode)


Instead of keeping a cost compute mode, please keep two separate methods that may have some internal common implementation. Methods may be named something like:
computeOptimalTaskCount and computeOptimalTaskCountOnRollover.

kfaraz · 2025-12-22T07:12:38Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

+      log.info(
+          "Changed taskCount to [%s] for supervisor[%s] for dataSource[%s].",
+          desiredActiveTaskCount,
+          supervisorId,
+          dataSource
+      );


Nit: @Fly-Style , could we please revert all the formatting changes from this PR, to keep the focus on the actual changes?

I think we can leave out the formatting changes for now.

kfaraz

Thanks for the follow up, @Fly-Style !
Left some initial drive-by comments. Will do a more thorough review later today/tomorrow.

… re-assignment during the rollover

kfaraz · 2025-12-22T12:48:19Z

...src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostResult.java

+ * Holds the result of a cost computation from {@link WeightedCostFunction#computeCost}.
+ * All costs are measured in seconds.
+ */
+public class CostResult


Thanks for adding this.

kfaraz · 2025-12-23T03:02:56Z

...g-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorManager.java

      autoscaler = spec.createAutoscaler(supervisor);

+      // Wire autoscaler back to supervisor for rollover-based scale-down
+      if (supervisor instanceof SeekableStreamSupervisor && autoscaler != null) {


It feels weird to first create the auto-scaler and then inject it back into the supervisor.

How about we add a createTaskAutoScaler() method on Supervisor interface itself.
Internally, this method will simply call spec.createAutoscaler(this) and will initialize its own auto-scaler field if needed.

kfaraz · 2025-12-23T03:03:46Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

    final AtomicInteger numStoppedTasks = new AtomicInteger();
    // Sort task groups by start time to prioritize early termination of earlier groups, then iterate for processing
    activelyReadingTaskGroups.entrySet().stream().sorted(
-            Comparator.comparingLong(


Please revert all the formatting changes in this file.

kfaraz · 2025-12-23T09:13:52Z

...rg/apache/druid/testing/embedded/indexing/autoscaler/CostBasedAutoScalerIntegrationTest.java

+  /**
+   * Tests that scale down happen during task rollover via checkTaskDuration().
+   *
+   * <p>Test flow:</p>
+   * <ol>
+   *   <li>Start supervisor with 10 tasks and 50 partitions, minimal data (500 records)</li>
+   *   <li>Wait for initial tasks to start running</li>
+   *   <li>Wait for the first task rollover to complete (task duration is 8 seconds)</li>
+   *   <li>Verify that after rollover, fewer tasks are running due to cost-based autoscaler (no ingestion at all)</li>
+   * </ol>
+   *
+   * <p>Scale down during rollover is triggered in {@code SeekableStreamSupervisor.checkTaskDuration()}
+   * when all task groups have rolled over and the autoscaler recommends a lower task count.</p>
+   */


I feel we should omit the javadoc. The test itself should be readable enough to follow through the details.

kfaraz · 2025-12-23T09:15:05Z

...rg/apache/druid/testing/embedded/indexing/autoscaler/CostBasedAutoScalerIntegrationTest.java

                .withConsumerProperties(kafkaServer.consumerProperties())
                .withTaskCount(taskCount)
-                .withTaskDuration(Seconds.THREE.toPeriod())
+                .withTaskDuration(Seconds.parseSeconds("PT7S").toPeriod())


Suggested change

.withTaskDuration(Seconds.parseSeconds("PT7S").toPeriod())

.withTaskDuration(Period.seconds(7))

kfaraz · 2025-12-23T10:37:21Z

...rg/apache/druid/testing/embedded/indexing/autoscaler/CostBasedAutoScalerIntegrationTest.java

+    final int postRolloverRunningTasks = cluster.callApi().getTaskCount("running", dataSource);
+
+    Assertions.assertTrue(


Thanks for adding this verification.

Can we update the existing scale down test to verify that scaling was actually skipped in that case?

Here it is actually should not be skipped :)

kfaraz · 2025-12-23T10:39:59Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

    // No-op.
  }

-  private CostMetrics collectMetrics()


Nit: Can we retain the original order of methods? It might help clean up the patch.

To be honest, I prefer to keep it changed, because it follows the correct order.

kfaraz · 2025-12-23T10:48:40Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

   * @return optimal task count for scale-up, or -1 if no scaling action needed
   */
-  public int computeOptimalTaskCount(CostMetrics metrics)
+  int computeOptimalTaskCount(CostMetrics metrics, CostComputeMode costComputeMode)


Nit: Rather than using a ComputeMode or even a boolean flag, it would be cleaner to just add two new methods, both of which call the computeOptimalTaskCount method.

public int computeTaskCountOnRollover(int currentTaskCount) { // Perform both scale downs and scale ups return computeOptimalTaskCount(); } public int computeTaskCountForScaleAction() { final CostMetrics metrics = collectMetrics(); final int currentTaskCount = metrics.currentTaskCount(); final int optimalTaskCount = computeOptimalTaskCount(metrics); // Perform only scale up actions return optimalTaskCount >= currentTaskCount ? optimalTaskCount : -1; }

This would significantly simplify the diff and clarify the intent better.

This reverts commit 5936fd0.

This reverts commit 7788e38.

Signed-off-by: Sasha Syrotenko <[email protected]>

…scaler interface method

kfaraz · 2025-12-23T19:40:27Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java


    final AtomicInteger numStoppedTasks = new AtomicInteger();
    // Sort task groups by start time to prioritize early termination of earlier groups, then iterate for processing
+    // Sort task groups by start time to prioritize early termination of earlier groups, then iterate for processing


duplicate comment.

kfaraz · 2025-12-23T19:40:48Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

Please retain the newline for clean separation of code.

kfaraz · 2025-12-23T19:44:57Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

+    if (taskAutoScaler != null && activelyReadingTaskGroups.isEmpty()) {
+      int rolloverTaskCount = taskAutoScaler.computeTaskCountForRollover();
+      if (rolloverTaskCount > 0 && rolloverTaskCount < ioConfig.getTaskCount()) {
+        log.info("Cost-based autoscaler recommends scaling down to [%d] tasks during rollover", rolloverTaskCount);


Suggested change

log.info("Cost-based autoscaler recommends scaling down to [%d] tasks during rollover", rolloverTaskCount);

log.info("Autoscaler recommends scaling down to [%d] tasks during rollover.", rolloverTaskCount);

kfaraz · 2025-12-23T19:45:26Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

+
+    if (taskAutoScaler != null && activelyReadingTaskGroups.isEmpty()) {
+      int rolloverTaskCount = taskAutoScaler.computeTaskCountForRollover();
+      if (rolloverTaskCount > 0 && rolloverTaskCount < ioConfig.getTaskCount()) {


Should we also allow scale up on task rollover?

kfaraz · 2025-12-23T19:46:10Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

+   * Sets the autoscaler reference for rollover-based scale-down decisions.
+   * Called by {@link SupervisorManager} after supervisor creation.
+   */
+  public void setTaskAutoScaler(@Nullable SupervisorTaskAutoScaler taskAutoScaler)


Is this still needed?

kfaraz · 2025-12-23T19:50:15Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

        metrics.getPollIdleRatio()
    );

+


Nit: extra newline?

kfaraz · 2025-12-24T03:29:40Z

server/src/main/java/org/apache/druid/indexing/overlord/supervisor/Supervisor.java

+  default SupervisorTaskAutoScaler createAutoscaler()
+  {
+    return null;
+  }


It seems a little untidy but the default impl should do the same thing that the existing impl does, so that we do not break extensions that use auto-scalers.

Suggested change

default SupervisorTaskAutoScaler createAutoscaler()

{

return null;

}

default SupervisorTaskAutoScaler createAutoscaler(SupervisorSpec spec)

{

return spec.createAutoscaler(this);

}

kfaraz · 2025-12-24T03:51:12Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

-    final Map<String, Map<String, Object>> taskStats = supervisor.getStats();
-    final double movingAvgRate = extractMovingAverage(taskStats, DropwizardRowIngestionMeters.ONE_MINUTE_NAME);
-    final double pollIdleRatio = extractPollIdleRatio(taskStats);
+    return computeOptimalTaskCount(collectMetrics());


I wonder if for this method we shouldn't just reuse the metrics collected in the last cycle.
Metrics collection may be slow since the supervisor might need to contact all the running tasks.
This would slow down the task rollover process causing ingestion lag.

kfaraz · 2025-12-24T03:52:34Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

      activelyReadingTaskGroups.remove(groupId);
    }
+
+    if (taskAutoScaler != null && activelyReadingTaskGroups.isEmpty()) {


Please move this entire new logic into a new private method and add a short javadoc to it.

Inject new task count calculation during the rollover

e5a0512

github-actions bot added the Area - Ingestion label Dec 19, 2025

Address feedback from the previuous PR post-review

08eddc3

kfaraz reviewed Dec 22, 2025

View reviewed changes

Addressing review comments-1

d6894c7

Fly-Style requested a review from kfaraz December 22, 2025 10:32

Fly-Style marked this pull request as ready for review December 22, 2025 10:46

Fly-Style added 3 commits December 22, 2025 13:37

Use the existing method to cleanup allocation info to force partition…

2d8ba11

… re-assignment during the rollover

Move the test from supervisor suite to autoscaler it test

7a853ab

Update test case for task rollover duration

1109f5b

kfaraz reviewed Dec 23, 2025

View reviewed changes

Fly-Style added 2 commits December 23, 2025 14:08

Formatting

004b10d

Address review comments - 2

6ce6562

Fly-Style requested a review from kfaraz December 23, 2025 12:57

Formatting

7788e38

Fly-Style force-pushed the cost-autoscaler-task-rollout branch from 01f5fee to 2719a17 Compare December 23, 2025 13:07

Formattin

5936fd0

Fly-Style force-pushed the cost-autoscaler-task-rollout branch from 2719a17 to 5936fd0 Compare December 23, 2025 13:08

Fly-Style added 3 commits December 23, 2025 15:09

Revert "Formattin"

640287c

This reverts commit 5936fd0.

Revert "Formatting"

2722e0d

This reverts commit 7788e38.

Format

2830ea1

Signed-off-by: Sasha Syrotenko <[email protected]>

Fly-Style force-pushed the cost-autoscaler-task-rollout branch from 6f3ce38 to 971f978 Compare December 23, 2025 13:17

Remove obsolete file

fd654e2

Fly-Style force-pushed the cost-autoscaler-task-rollout branch from 971f978 to fd654e2 Compare December 23, 2025 13:18

Fly-Style added 2 commits December 23, 2025 16:58

Fix the SupervisorManagerTest since addition of Supervisor#createAuto…

24fdd70

…scaler interface method

Increase test coverage

833cb88

Fly-Style force-pushed the cost-autoscaler-task-rollout branch from 065c5fa to 833cb88 Compare December 23, 2025 20:57

kfaraz reviewed Dec 24, 2025

View reviewed changes

	.withTaskDuration(Seconds.parseSeconds("PT7S").toPeriod())
	.withTaskDuration(Period.seconds(7))

		final int postRolloverRunningTasks = cluster.callApi().getTaskCount("running", dataSource);

		Assertions.assertTrue(

	log.info("Cost-based autoscaler recommends scaling down to [%d] tasks during rollover", rolloverTaskCount);
	log.info("Autoscaler recommends scaling down to [%d] tasks during rollover.", rolloverTaskCount);

Inject new task count calculation during the rollover #18860

Are you sure you want to change the base?

Inject new task count calculation during the rollover #18860

Uh oh!

Conversation

Fly-Style commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Fly-Style commented Dec 19, 2025 •

edited

Loading