Skip to content

pool: Track decode blocks without parent lineage (from prefill)#347

Open
RishabhSaini wants to merge 3 commits intollm-d:mainfrom
RishabhSaini:pdDisaggSupport
Open

pool: Track decode blocks without parent lineage (from prefill)#347
RishabhSaini wants to merge 3 commits intollm-d:mainfrom
RishabhSaini:pdDisaggSupport

Conversation

@RishabhSaini
Copy link

@RishabhSaini RishabhSaini commented Feb 23, 2026

Previously in PD Disagg mode, the decode KV Cache events were dropped since their parents were generated in the Prefill pods

@yankay
Copy link
Collaborator

yankay commented Feb 25, 2026

I think this change is reasonable, but I have a follow-up question: after the key is properly added to the index, will the routing process perform a lookup and select the appropriate decoding node?

@vMaroon
Copy link
Member

vMaroon commented Feb 26, 2026

Can you verify that KVEvents are not generated after transferring KVs from prefill to decode? AFAIK the mechanism should be working. If it doesn't, then it's a bug in vLLM.

@RishabhSaini
Copy link
Author

RishabhSaini commented Feb 28, 2026

Looks like there is a race condition here when doing PD Disagg causing Parent block key not found in index:

Prefill Pod -> ZMQ PUB -> EPP SUB (goroutine 1) -> Worker Queue 0 -> digestEvents
Decode Pod -> ZMQ PUB -> EPP SUB (goroutine 2) -> Worker Queue 1 -> digestEvents

  1. Prefill and decode publish events independently - no synchronization between pods
  2. EPP uses separate ZMQ subscribers per pod - each runs in its own goroutine
  3. Events from different pods go to different worker queues - no ordering guarantee across pods
  4. Decode's events can be processed before prefill's - causing parent block lookup to fail

@RishabhSaini RishabhSaini force-pushed the pdDisaggSupport branch 3 times, most recently from 97da2cb to e503b92 Compare February 28, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants