Skip to content

[Fix] TokenizerManager: avoid KeyError race in _wait_one_response#29529

Open
GodlyDonuts wants to merge 1 commit into
sgl-project:mainfrom
GodlyDonuts:fix/tokenizer-manager-rid-race
Open

[Fix] TokenizerManager: avoid KeyError race in _wait_one_response#29529
GodlyDonuts wants to merge 1 commit into
sgl-project:mainfrom
GodlyDonuts:fix/tokenizer-manager-rid-race

Conversation

@GodlyDonuts

@GodlyDonuts GodlyDonuts commented Jun 27, 2026

Copy link
Copy Markdown

Motivation

Fixes #29256. nightly-perf-2-gpu-vlm started failing with a KeyError that cuts off the response mid-stream (Response ended prematurely on the client):

File ".../managers/tokenizer_manager.py", line 1451, in _wait_one_response
    state = self.rid_to_state[obj.rid]
KeyError: '5e25b243e77743e6b3711ae7206ffdf3'

_wait_one_response looked up self.rid_to_state[obj.rid] on entry. For batched and parallel-sample requests the per-request generators are built in a loop and only advanced later under asyncio.gather. If one finishes first, _handle_batch_output removes its rid (del self.rid_to_state[rid]) before the waiter runs, so the lookup raised KeyError. The extra await points added in #29012 made the window wider, which is why the failure bisects to that PR.

Modifications

  • Pass the ReqState the caller already holds into _wait_one_response instead of looking it up again. Every caller fetches state right after registering the request; the warmup path now does the same. The notify path in _handle_batch_output sets state.event on that same object, so the output still arrives after the rid is removed.
  • Add a regression test that runs _wait_one_response with the rid already gone from rid_to_state and checks the output is still returned.

Checklist

  • Format code with pre-commit
  • Add unit tests

CI States

Latest PR Test (Base): ❌ Run #28300592522
Latest PR Test (Extra): ❌ Run #28300592456

_wait_one_response looked up self.rid_to_state[obj.rid] on entry. For batched
and parallel-sample requests the per-request generators are built in a loop and
only advanced later under asyncio.gather. If one of them finishes first,
_handle_batch_output removes its rid before the waiter runs, so the lookup
raised KeyError and cut off the response. The extra await points added in sgl-project#29012
made the window wider and broke nightly-perf-2-gpu-vlm.

Pass the state the caller already holds into _wait_one_response instead of
looking it up again. The notify path sets state.event on the same object, so
the output still arrives after the rid has been removed.

Add a test that runs _wait_one_response with the rid already gone and checks
the output is still returned.

Fixes sgl-project#29256

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request resolves a race condition in TokenizerManager where _wait_one_response could raise a KeyError if a request's rid was cleaned up from rid_to_state before the generator was first advanced. To fix this, _wait_one_response now accepts the ReqState object directly from the caller instead of performing a dictionary lookup. Corresponding unit tests have been added to verify this behavior. I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] tokenizer_manager KeyError in _wait_one_response (rid_to_state) — regression from #29012, breaks nightly-perf-2-gpu-vlm

1 participant