Skip to content

fix: Fix diskNormalized cost balancer strategy#19303

Merged
jtuglu1 merged 3 commits intoapache:masterfrom
jtuglu1:disk-utilization-based-balancer-strategy-fix
Apr 23, 2026
Merged

fix: Fix diskNormalized cost balancer strategy#19303
jtuglu1 merged 3 commits intoapache:masterfrom
jtuglu1:disk-utilization-based-balancer-strategy-fix

Conversation

@jtuglu1
Copy link
Copy Markdown
Contributor

@jtuglu1 jtuglu1 commented Apr 13, 2026

Description

We've seen issues with the cost segment balancing strategy not working correctly in production, where it leaves historicals in a tier in a bimodal distribution: ~50% in ~100% utilization and ~50% in 60% utilization. This is due to the cost-based balancer not having any visibility into disk util on each historical. The diskNormalized strategy is available, but is marked as not "production-ready," I believe due to the following reason:

The “diskNormalized” balancer works by extending from the “cost” balancer and overriding the computeCost function in the following way:

double cost = super.computeCost(proposalSegment, server, includeCurrentServer);
if (cost == Double.POSITIVE_INFINITY) {
  return cost;
}
int nSegments = 1;
if (server.getServer().getNumSegments() > 0) {
  nSegments = server.getServer().getNumSegments();
}
double normalizedCost = cost / nSegments;
double usageRatio = (double) server.getSizeUsed() / (double) server.getServer().getMaxSize();
return normalizedCost * usageRatio;

This logic is a bit strange. Consider the case where historicals are all homogeneous, so server.getServer().getMaxSize() can be treated as a constant. In that case, the returned value simplifies to:

normalizedCost * usageRatio
= normalizedCost * usedSize / maxSize
= normalizedCost * usedSize / CONST
= cost / numSegments * usedSize / CONST
= cost * (usedSize / numSegments) / CONST
= cost * avgSegmentSize / CONST

So, in practice, it’s just adjusting cost by a factor of avgSegmentSize. This could lead to imbalances that make themselves worse over time, especially if newly created segments tend to have smaller sizes than older ones (which is common with append-based ingestion methods).

This fixes the diskNormalized strategy just weight the CostBalancerStrategy with the utilization of the server, defaulting to a 5% lenience threshold (TODO: make this adjustable via config?) to avoid ping-ponging segments between 2 historicals due to utilization differences.

Release note

Fix diskNormalized cost balancer strategy


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 changed the title Fix diskNormalized cost balancer strategy fix: Fix diskNormalized cost balancer strategy Apr 13, 2026
@jtuglu1 jtuglu1 requested review from clintropolis and maytasm April 13, 2026 20:55
@jtuglu1 jtuglu1 force-pushed the disk-utilization-based-balancer-strategy-fix branch from 169a2e1 to 63e5b04 Compare April 13, 2026 21:40
@jtuglu1 jtuglu1 marked this pull request as ready for review April 13, 2026 23:45
Copy link
Copy Markdown
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: diskUtilThresholdTolerance is a weird name. It doesn't indicate that a segment is already on the server. Maybe like a ...stickiness... or ...movingCost... is better?

Comment thread services/src/main/java/org/apache/druid/cli/CliCoordinator.java
@jtuglu1 jtuglu1 force-pushed the disk-utilization-based-balancer-strategy-fix branch 7 times, most recently from 34e2003 to 13c5d4d Compare April 23, 2026 08:07
@jtuglu1 jtuglu1 force-pushed the disk-utilization-based-balancer-strategy-fix branch from 13c5d4d to ddad803 Compare April 23, 2026 08:17
@jtuglu1 jtuglu1 merged commit 0ff3ec1 into apache:master Apr 23, 2026
28 checks passed
@github-actions github-actions Bot added this to the 38.0.0 milestone Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants