Change segment state on HttpLoadQueuePeon from "moving from" to "dropping" atomically #18824

kfaraz · 2025-12-08T03:57:29Z

Description

Based on discussions with @jtuglu1 in #18764 , there seem to be some race conditions in segment balancing between the duty run thread and the segment drop callback thread (triggered by segment balancing).

Race condition 1

Duty takes snapshot of peon when segment is in the middle of being transitioned from state MOVING_FROM to DROPPING.

The result is that the ServerHolder assumes that this segment actually has state LOADED (when in fact it should be DROPPING)
This leads the StrategicSegmentAssigner to assume that the segment is over-replicated (being loaded on both A and B) and then trigger a drop from either A (no-op since we were going to start a drop from A anyway) or B (segment becomes briefly unavailable).

Race condition 2

Duty takes snapshot of peon after the DROP operation has succeeded but it is not yet reflected in the inventory (since the inventory was snapshotted earlier in the flow).

This would again cause the StrategicSegmentAssigner to consider this segment as over-replicated.
Something similar may happen with a load operation as well.

Changes

Use a synchronized block to ensure that a segment is moved from the set segmentsMarkedToDrop to segmentsToDrop atomically
Use @GuardedBy to ensure that these fields remain thread-safe
Pass the list of segments already loaded on a server to LoadQueuePeon.getSegmentsInQueue() so that we track completed requests inside the peon until the inventory has confirmed success of the same.

This PR has:

…ping" atomically

jtuglu1 · 2025-12-08T04:52:15Z

Hi @kfaraz, thanks for the patch. While I think this does potentially fix some other issues (and is just generally good to have), I don't think this fully fixes the aforementioned issue:

The server inventory view is updated to show loaded=2 (callback on B is showing).
We take a copy of the server inventory view in prepareServers() which is then incrementing loaded here.
Then, load on B + drop on A callbacks happen.
We take a copy of the queuedSegments in all the HttpLoadQueuePeon in prepareCluster() routine. Since the callbacks have already completed, the copy doesn't see any dropping or loading, etc. (it's effectively empty for that segment). But the old "stale" copy of the server inventory view shows loaded=2.

While this is still improvement, I don't think it fixes the core issue that the server inventory and the httploadqueue peons can be "snapshotted" non-atomically, allowing for how we build the SegmentReplicaCount to be messed up. We need to have a way to ensure the ServerHolder and the HttpLoadQueuePeon can be atomically snapshotted without any callback interleavings happening.

kfaraz · 2025-12-08T09:47:58Z

Thanks for the clarification, @jtuglu1 !
It makes sense to have the inventory view be consistent with the peons.
I also encountered another possible race condition related segmentsMarkedToDrop.
In ServerHolder.initializeQueuedSegments(), we make a call to peon.getSegmentsInQueue() and peon.getSegmentsMarkedToDrop() in quick succession, but it is possible that between these calls, the state of the peon changes.

I have tried to address this race condition and the reconciliation with the server inventory snapshot in the latest commit.
Let me know if it makes sense and addresses your issue.

kfaraz · 2025-12-09T08:41:59Z

Currently squashing some more potential race conditions in the flow. Will update this PR soon.

jtuglu1 · 2025-12-09T09:03:51Z

Thanks. I'm doing a bit of simulation testing to see if I can squeeze some more out. I've added the other potential one I uncovered here:

T0[Coordinator]: Start DutyGroup[HistoricalManagementDuties]
T1[Coordinator]: Start PrepareBalancerAndLoadQueues::run()
T2[Coordinator]: Move initiated of segment[S] from Server[A] to Server[B] (no callbacks yet)
T3[Coordinator]: End PrepareBalancerAndLoadQueues::run()
T4[Coordinator]: End DutyGroup[HistoricalManagementDuties]
T5[Coordinator]: Start DutyGroup[HistoricalManagementDuties]
T6[Coordinator]: Start PrepareBalancerAndLoadQueues::run()
T7[Coordinator]: enter prepareCurrentServers()
T8[Coordinator]: exit prepareCurrentServers()
T9[Coordinator]: enter prepareCluster()
T10[Coordinator-callback]: Server[B] completed request[MOVE_TO] on segment[S] with status[SUCCESS]
T11[Coordinator-callback]: Dropping segment [S] from server[A]
T12[A]: Completely removing segment[S] in [30,000]ms.
T13[Coordinator-callback]: Server[A] completed request[DROP] on segment[S] with status[SUCCESS].
T14[Coordinator]: exit prepareCluster()
// at this point, I think somehow the segmentreplicacount struct has loaded=0 (somehow which causes the other load)
T15[Coordinator-callback]: Server[C] completed request[LOAD] on segment[S] with status[SUCCESS]
T16[Coordinator]: End DutyGroup[HistoricalManagementDuties]

This causes a 2nd load

Change segment state on HttpLoadQueuePeon from "moving from" to "drop…

0856196

…ping" atomically

Make LoadQueuePeon.getSegmentsInQueue() consistent with loaded segments

b0f78c0

jtuglu1 self-requested a review December 8, 2025 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change segment state on HttpLoadQueuePeon from "moving from" to "dropping" atomically #18824

Change segment state on HttpLoadQueuePeon from "moving from" to "dropping" atomically #18824

Uh oh!

kfaraz commented Dec 8, 2025 •

edited

Loading

Uh oh!

jtuglu1 commented Dec 8, 2025 •

edited

Loading

Uh oh!

kfaraz commented Dec 8, 2025

Uh oh!

kfaraz commented Dec 9, 2025

Uh oh!

jtuglu1 commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Change segment state on HttpLoadQueuePeon from "moving from" to "dropping" atomically #18824

Are you sure you want to change the base?

Change segment state on HttpLoadQueuePeon from "moving from" to "dropping" atomically #18824

Uh oh!

Conversation

kfaraz commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Race condition 1

Race condition 2

Changes

Uh oh!

jtuglu1 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz commented Dec 8, 2025

Uh oh!

kfaraz commented Dec 9, 2025

Uh oh!

jtuglu1 commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfaraz commented Dec 8, 2025 •

edited

Loading

jtuglu1 commented Dec 8, 2025 •

edited

Loading