|
| 1 | +A113: pick_first: Weighted Random Shuffling |
| 2 | +---- |
| 3 | +* Author(s): Alex Polcyn (@apolcyn) |
| 4 | +* Approver: Mark Roth (@markdroth), Eric Anderson (@ejona86), Doug Fawley (@dfawley), Easwar Swaminathan (@easwars) |
| 5 | +* Status: Draft |
| 6 | +* Implemented in: <language, ...> |
| 7 | +* Last updated: Jan 26, 2026 |
| 8 | +* Discussion at: <google group thread> (filled after thread exists) |
| 9 | + |
| 10 | +## Abstract |
| 11 | + |
| 12 | +Support weighted random shuffling in the pick first LB policy. |
| 13 | + |
| 14 | +## Background |
| 15 | + |
| 16 | +The pick first LB policy currently supports random shuffling. A primary intention of the feature |
| 17 | +is for load balancing, however it does not take (possibly present) locality or endpoint weights |
| 18 | +into account. Naturally this can lead to skewed load distribution and hotspots, when the load |
| 19 | +balancing control plane delivers varied weights and expects them to be followed. |
| 20 | + |
| 21 | + |
| 22 | +### Related Proposals: |
| 23 | +* [A62](https://github.com/grpc/proposal/blob/master/A62-pick-first.md): pick_first: sticky TRANSIENT_FAILURE and address order randomization |
| 24 | +* [A42](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md) xDS Ring Hash LB Policy |
| 25 | + |
| 26 | +## Proposal |
| 27 | + |
| 28 | +Modify behavior of pick_first when the `shuffle_address_list` option is set, and |
| 29 | +perform a weighted random sort *based on per-endpoint weights*: |
| 30 | +* Use the [Weighted Random Sampling](https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf) algorithm |
| 31 | +proposed by Efraimidis, Spirakis. |
| 32 | +* Set the weight of each endpoint to `u ^ (1 / weight)`, where `u` is a uniform random number in `(0, 1)` and weight |
| 33 | +is the weight of the endpoint (as present in a weight attribute). Default to 1 if no weight attribute is present. |
| 34 | + |
| 35 | +Note in XDS, we have a notion of both locality and endpoint weights. The expectation of the load balancing |
| 36 | +control plane is to *first* pick locality and *second* pick endpoint. The total probability distribution |
| 37 | +reflected by per-endpoint weights must reflect this. As such, we need to normalize locality weights within |
| 38 | +each priority and endpoint weights within locality; the final weight provided to `pick_first` should be a |
| 39 | +product of the two normalized weights (i.e. a logical AND of the two selection events). |
| 40 | + |
| 41 | +The CDS LB policy currently calculates per-endpoint weight attributes. We can continue with this however |
| 42 | +we need to modify CDS LB to compute the final per-endpoint weights as a product of normalized locality |
| 43 | +and endpoint weights rather than their product outright. Note: as a side effect this will fix per-endpoint |
| 44 | +weights in Ring Hash LB. |
| 45 | + |
| 46 | +We can continue to represent weights as integers if we represent their normalized values in |
| 47 | +fixed point Q31 format (citation due for @ejona): |
| 48 | + |
| 49 | +1) Normalize a weight within a `weight_sum` as follows: `uint32_t normalized = ((uint64_t)weight * 2 ^ 31) / weight_sum`. |
| 50 | + |
| 51 | +2) Multiply two normalized weights as follows: `weight = ((uint64_t) weight1 * weight2) >> 31` |
| 52 | + |
| 53 | +3) Zero weights should be rounded up to 1. |
| 54 | + |
| 55 | +### Temporary environment variable protection |
| 56 | + |
| 57 | +CDS LB policy and Pick First LB policy behavior changes will be guarded by `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. |
| 58 | + |
| 59 | +## Rationale |
| 60 | + |
| 61 | +* CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but |
| 62 | + also for Ring Hash |
| 63 | +* Using fixed point Q31 format has predictable bounds on precision, and allows us to continue representing |
| 64 | + weights as integers. Note our math assumes the sum of weights within a grouping does not exceed max uint32, |
| 65 | + which is mandated in the XDS protocol. |
| 66 | + |
| 67 | +## Implementation |
| 68 | + |
| 69 | +TBD |
| 70 | + |
0 commit comments