Skip to content

[Bug]: Async chunk requests can stay in WAITING_FOR_CHUNK indefinitely if upstream chunk never arrives #3833

@Ronnie-Rui

Description

@Ronnie-Rui

Your current environment

The output of python collect_env.py

OS: Ubuntu 22.04.5 LTS (x86_64)
Python: 3.12.11
PyTorch: 2.11.0+cu130
CUDA available: True
GPU: NVIDIA GeForce RTX 4090

Your code version

Details vLLM Version: 0.20.0

Describe the bug

Image

In async chunk mode, downstream requests can remain in RequestStatus.WAITING_FOR_CHUNK indefinitely if the expected upstream chunk is never produced or never becomes visible through the connector.

The relevant path is OmniChunkTransferAdapter.process_pending_chunks():

  • downstream stage calls load_async(request);
  • request status is changed to WAITING_FOR_CHUNK;
  • request is temporarily removed from the scheduler queue;
  • restore_queues() puts it back after scheduling;
  • if _poll_single_request() never observes the expected chunk, the request keeps cycling in WAITING_FOR_CHUNK.

Currently I could not find a timeout or cleanup path for this state. This means failures in the upstream stage, connector write failures, dropped terminal chunks, or a custom payload extraction path returning no payload can turn into a hanging streaming request instead of a visible request failure.

Expected behavior:

If a request waits too long for an async chunk, vllm-omni should eventually fail/cleanup the request instead of leaving it in WAITING_FOR_CHUNK forever.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions