Your current environment
The output of python collect_env.py
OS: Ubuntu 22.04.5 LTS (x86_64)
Python: 3.12.11
PyTorch: 2.11.0+cu130
CUDA available: True
GPU: NVIDIA GeForce RTX 4090
Your code version
Details
vLLM Version: 0.20.0
Describe the bug
In async chunk mode, downstream requests can remain in RequestStatus.WAITING_FOR_CHUNK indefinitely if the expected upstream chunk is never produced or never becomes visible through the connector.
The relevant path is OmniChunkTransferAdapter.process_pending_chunks():
- downstream stage calls
load_async(request);
- request status is changed to
WAITING_FOR_CHUNK;
- request is temporarily removed from the scheduler queue;
restore_queues() puts it back after scheduling;
- if
_poll_single_request() never observes the expected chunk, the request keeps cycling in WAITING_FOR_CHUNK.
Currently I could not find a timeout or cleanup path for this state. This means failures in the upstream stage, connector write failures, dropped terminal chunks, or a custom payload extraction path returning no payload can turn into a hanging streaming request instead of a visible request failure.
Expected behavior:
If a request waits too long for an async chunk, vllm-omni should eventually fail/cleanup the request instead of leaving it in WAITING_FOR_CHUNK forever.
Before submitting a new issue...
Your current environment
The output of
python collect_env.pyOS: Ubuntu 22.04.5 LTS (x86_64)
Python: 3.12.11
PyTorch: 2.11.0+cu130
CUDA available: True
GPU: NVIDIA GeForce RTX 4090
Your code version
Details
vLLM Version: 0.20.0Describe the bug
In async chunk mode, downstream requests can remain in
RequestStatus.WAITING_FOR_CHUNKindefinitely if the expected upstream chunk is never produced or never becomes visible through the connector.The relevant path is
OmniChunkTransferAdapter.process_pending_chunks():load_async(request);WAITING_FOR_CHUNK;restore_queues()puts it back after scheduling;_poll_single_request()never observes the expected chunk, the request keeps cycling inWAITING_FOR_CHUNK.Currently I could not find a timeout or cleanup path for this state. This means failures in the upstream stage, connector write failures, dropped terminal chunks, or a custom payload extraction path returning no payload can turn into a hanging streaming request instead of a visible request failure.
Expected behavior:
If a request waits too long for an async chunk, vllm-omni should eventually fail/cleanup the request instead of leaving it in
WAITING_FOR_CHUNKforever.Before submitting a new issue...