Skip to content

feat: CBA idle re-modeling and separate scale-up / scale-down task-count boundaries#19378

Open
Fly-Style wants to merge 7 commits intoapache:masterfrom
Fly-Style:cba-autoscaler-tweaking
Open

feat: CBA idle re-modeling and separate scale-up / scale-down task-count boundaries#19378
Fly-Style wants to merge 7 commits intoapache:masterfrom
Fly-Style:cba-autoscaler-tweaking

Conversation

@Fly-Style
Copy link
Copy Markdown
Contributor

@Fly-Style Fly-Style commented Apr 26, 2026

This PR updates the seekable-stream cost-based autoscaler to make task-count decisions more stable and easier to reason about.

The main behavioral change is replacing the previous linear idle cost with a U-shaped idle cost centered around an ideal idle ratio. This penalizes both under-provisioning, where tasks have too little idle headroom, and over-provisioning, where tasks spend too much time idle. The predictedIdleRatio clamp at 0 made the U-shape's under-provisioning penalty saturate, so smaller taskCount always won the cost race — the optimizer scaled down a
busy, lag-free cluster. The goal is to keep ingestion tasks near a practical operating point instead of treating all additional idle time as uniformly bad.

This also separates task-count boundary controls for scale-up and scale-down. Scale-up remains unbounded by default so the autoscaler can react aggressively to lag, while scale-down is bounded by default to avoid large drops in task count. Candidate task counts are still generated from valid partitions-per-task ratios, but the optimizer can now limit which candidates are evaluated depending on the configured scale direction boundary.

What Changed

  • Added U-shaped idle cost in WeightedCostFunction, with an ideal idle ratio and asymmetric penalties for under- and over-provisioning.
  • Updated default cost weights to favor a more balanced lag/idle tradeoff.
  • Split the single useTaskCountBoundaries setting into separate for scale up/scale down
  • Added CostResult.INFINITE_COST so skipped candidates can still be represented safely in cost tables.
  • Updated supervisor docs for the cost-based autoscaler behavior and config options.
  • Expanded tests around U-shaped idle cost, config serialization/defaults, valid task-count generation, and bounded vs unbounded task-count jumps.
  • Removed highLagThreshold and useTaskCountBoundaries parameter, as obsolete, but kept in ctor for b/w compatibility.
  • lag amplification multiplier was increased due to changes in idle calculation formula from a point of view where scale-up/scale-down decisions are 'normal' in terms of normal distribution near 0.5/0.5 weights. 0.4 is a good amplification multiplier that was picked after a series of tests and calculations. 0.4/0.6 as default weights was picked from a conservative point of view. A Python script with computations is available by request.

Details

Updated scaleup scenery, visualized.

Details

Task boundaries are disabled, lag = 50k, current taskCount is 1. Plot contains p* as partitions count, and idle as current poll-idle-avg-ratio metric.

scaleup

Scaledown scenery, visualized.

Details

Task boundaries for scale-down are disabled, taskCount = partitionCount = 128, lag = 0.

cost_based_scaledown_compare

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

@Fly-Style Fly-Style changed the title Cba autoscaler tweaking feat: CBA idle re-modeling and separate scale-up / scale-down task-count boundaries Apr 26, 2026
@Fly-Style Fly-Style requested a review from jtuglu1 April 27, 2026 06:39
this.useTaskCountBoundaries = Configs.valueOrDefault(useTaskCountBoundaries, false);
this.highLagThreshold = Configs.valueOrDefault(highLagThreshold, -1);
this.minScaleUpDelay = Configs.valueOrDefault(minScaleUpDelay, Duration.millis(this.minTriggerScaleActionFrequencyMillis));
this.useTaskCountBoundariesOnScaleUp = Configs.valueOrDefault(useTaskCountBoundariesOnScaleUp, false);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Legacy boundary setting is accepted but ignored

The constructor still accepts the legacy useTaskCountBoundaries property, but the new scale-up/down fields are initialized only from useTaskCountBoundariesOnScaleUp and useTaskCountBoundariesOnScaleDown. Existing supervisor specs with useTaskCountBoundaries: true will silently lose the scale-up boundary after upgrade, allowing unbounded jumps to any candidate task count. Map the legacy value to the new fields when the new fields are absent, or reject/document a breaking config change.

Copy link
Copy Markdown
Contributor Author

@Fly-Style Fly-Style Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This autoscaler was in experimental mode, but I will log on warn level and document the breaking change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in : 5b61b6d

@Fly-Style Fly-Style self-assigned this Apr 27, 2026
@Fly-Style Fly-Style force-pushed the cba-autoscaler-tweaking branch from ec723ee to 5b61b6d Compare April 27, 2026 15:31
Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems off with the "Scaledown scenery, visualized" plot. It says the conditions are:

Task boundaries for scale-down are enabled, taskCount = partitionCount = 128, lag = 0.

It shows that when current idle ratio is < 0.15 or so, the optimal task count becomes ~40. Scaling down 3x when current idle ratio is low will likely lead to the new set of tasks being overloaded.

* Maximum number of candidate task counts to evaluate above or below the current task count
* when scale-up or scale-down boundaries are enabled.
* <p>
* The misspelling is preserved to avoid unnecessary churn in this package-private constant.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. The constant is new in this patch. Please fix the spelling.


At every evaluation interval, Druid computes the score for each candidate task count and picks the one with the lowest total cost.

Note: Kinesis is not supported yet, support is in progress.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really in progress?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to verify if anybody have a Kinesis workload with CBA working. If you want, we can remove that part.

* during extensive testing as the most balanced multiplier for high-lag recovery.
*/
static final double LAG_AMPLIFICATION_MULTIPLIER = 0.05;
static final double LAG_AMPLIFICATION_MULTIPLIER = 0.4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Copy Markdown
Contributor Author

@Fly-Style Fly-Style Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will note it in the patch notes. Generally, the intention was to find a point where scale-up/scale-down decisions are 'normal' in terms of normal distribution near 0.5/0.5 weights. 0.4 is a good amplification multiplier. 0.4/0.6 as default weights was picked from conservativity point of view.

.minTriggerScaleActionFrequencyMillis(1000)
.lagWeight(0.2)
.idleWeight(0.8)
.lagWeight(0.8)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the effect of the change to lag and idle weights?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was passing without any problems in normal circumstances. The main idea of the change is to reduce the potential of not scaling over the timeout due to CI CPU pressure.

@Fly-Style
Copy link
Copy Markdown
Contributor Author

It shows that when current idle ratio is < 0.15 or so, the optimal task count becomes ~40. Scaling down 3x when current idle ratio is low will likely lead to the new set of tasks being overloaded.

Oh, this is a lag in my head, apologies. 🤦🏻

@Fly-Style Fly-Style requested review from FrankChen021 and gianm April 27, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants