-
Notifications
You must be signed in to change notification settings - Fork 3.8k
fix: adjust diskNormalized strategy to scale cost exponentially with disk utilization #19422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,40 +26,40 @@ | |
|
|
||
| /** | ||
| * A {@link BalancerStrategy} which normalizes the cost of placing a segment on a | ||
| * server as calculated by {@link CostBalancerStrategy} by multiplying it by the | ||
| * server's disk usage ratio. | ||
| * server as calculated by {@link CostBalancerStrategy} by dividing by the | ||
| * server's available disk headroom. | ||
| * <pre> | ||
| * normalizedCost = cost * usageRatio | ||
| * normalizedCost = cost / max(EPSILON, 1 - usageRatio) | ||
| * where usageRatio = diskUsed / totalDiskSpace | ||
| * </pre> | ||
| * This penalizes servers that are more full, driving disk utilization to equalize | ||
| * across the tier. When all servers have equal disk usage, the behavior is identical | ||
| * to {@link CostBalancerStrategy}. When historicals have different disk capacities, | ||
| * this naturally accounts for both fill level and total capacity. | ||
| * The denominator diverges as a server approaches full, so disk fullness has | ||
| * more weight over the placement decision when servers are nearly full, | ||
| * regardless of asymmetries in the locality cost. {@link #EPSILON} is a small | ||
| * numerical floor on the divisor to guard against division by zero (or by | ||
| * negative values during in-flight loads). | ||
| * <p> | ||
| * To prevent oscillation when servers have similar utilization, any server that | ||
| * To prevent oscillation when servers have similar headroom, any server that | ||
| * is already projected to hold the segment (the source on a move, or a currently | ||
| * serving node on a drop) receives a cost discount equal to | ||
| * {@link #DEFAULT_MOVE_COST_SAVINGS_THRESHOLD}. A move therefore fires only when | ||
| * {@link DiskNormalizedCostBalancerStrategyConfig.DEFAULT_MOVE_COST_SAVINGS_THRESHOLD}. A move therefore fires only when | ||
| * the destination saves at least this fraction of the source's cost. The default | ||
| * is configurable via | ||
| * {@code druid.coordinator.balancer.diskNormalized.moveCostSavingsThreshold}. | ||
| */ | ||
| public class DiskNormalizedCostBalancerStrategy extends CostBalancerStrategy | ||
| { | ||
| /** | ||
| * Default minimum fractional cost reduction required before a segment will | ||
| * be moved off a server that is already projected to hold it. A value of | ||
| * {@code 0.05} means the destination must be at least 5% cheaper than the | ||
| * source for the move to happen. | ||
| * Numerical floor on the headroom divisor to prevent division by zero or by | ||
| * negative values when {@code usageRatio >= 1.0} (possible for over-allocated | ||
| * servers or during in-flight loads). | ||
| */ | ||
| static final double DEFAULT_MOVE_COST_SAVINGS_THRESHOLD = 0.05; | ||
| static final double EPSILON = 1e-6; | ||
|
|
||
| private final double sourceCostMultiplier; | ||
|
|
||
| public DiskNormalizedCostBalancerStrategy(ListeningExecutorService exec) | ||
| { | ||
| this(exec, DEFAULT_MOVE_COST_SAVINGS_THRESHOLD); | ||
| this(exec, DiskNormalizedCostBalancerStrategyConfig.DEFAULT_MOVE_COST_SAVINGS_THRESHOLD); | ||
| } | ||
|
|
||
| public DiskNormalizedCostBalancerStrategy(ListeningExecutorService exec, double moveCostSavingsThreshold) | ||
|
|
@@ -85,17 +85,16 @@ protected double computePlacementCost( | |
| return cost; | ||
| } | ||
|
|
||
| // Guard against NaN propagation in the cost comparator if a server | ||
| // somehow reports a non-positive maxSize. Such a server cannot hold | ||
| // anything and will be rejected by canLoadSegment, so returning the | ||
| // raw cost is safe. | ||
| // A server with non-positive maxSize cannot hold anything and will be | ||
| // rejected by canLoadSegment; return the raw cost to avoid NaN propagation. | ||
| final long maxSize = server.getMaxSize(); | ||
| if (maxSize <= 0) { | ||
| return cost; | ||
| } | ||
|
|
||
| double usageRatio = (double) server.getSizeUsed() / maxSize; | ||
| double normalizedCost = cost * usageRatio; | ||
| final double usageRatio = (double) server.getSizeUsed() / maxSize; | ||
| final double headroom = Math.max(EPSILON, 1.0 - usageRatio); | ||
| double normalizedCost = cost / headroom; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] Existing threshold test now fails Changing normalization to |
||
|
|
||
| if (server.isProjectedSegment(proposalSegment)) { | ||
| normalizedCost *= sourceCostMultiplier; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Public docs still describe the old formula
The implementation and Javadoc now divide by available headroom, but
docs/design/coordinator.mdanddocs/configuration/index.mdstill saydiskNormalizedmultiplies cost bydiskUsed / maxSize. That leaves user-facing behavior documentation incorrect for this config option.