[Bug]: TPU Inference issue - Qwen3-VL-8B

### Your current environment
**TPU Info:**
TPU info: node_name=v6e-test | tpu_type=v6e-4 | worker_id=0 | num_chips=4 | num_cores_per_chip=1

**Command to start vLLM server:**
docker run -it --rm \
  --net=host \
  --privileged \
  -v /dev:/dev \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-tpu:nightly-20260303-1f005dc-4034c3d \
  vllm serve Qwen/Qwen3-VL-8B-Instruct \
    --tensor-parallel-size 4 \
    --dtype bfloat16 \
    --max-model-len 22528 \
    --max-num-seqs 16 \
    --host 0.0.0.0 \
    --port 7000 \
    --trust-remote-code \
    --enable-log-requests


### 🐛 Describe the bug

**Issue:**

(APIServer pid=1) INFO 03-04 01:49:51 [async_llm.py:421] Added request chatcmpl-b61864175ca7091d-a87942ae.
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] Traceback (most recent call last):
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]   File "/workspace/vllm/vllm/v1/engine/core.py", line 1093, in run_engine_core
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]   File "/workspace/vllm/vllm/v1/engine/core.py", line 1128, in run_busy_loop
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]     self._process_engine_step()
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]   File "/workspace/vllm/vllm/v1/engine/core.py", line 1165, in _process_engine_step
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]   File "/workspace/vllm/vllm/v1/engine/core.py", line 507, in step_with_batch_queue
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]     engine_core_outputs = self.scheduler.update_from_output(
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]   File "/workspace/vllm/vllm/v1/core/sched/scheduler.py", line 1316, in update_from_output
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]     req_index = model_runner_output.req_id_to_index[req_id]
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102]                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] KeyError: 'chatcmpl-b61864175ca7091d-a87942ae'
(EngineCore_DP0 pid=564) Process EngineCore_DP0:
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] AsyncLLM output_handler failed.
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] Traceback (most recent call last):
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708]   File "/workspace/vllm/vllm/v1/engine/async_llm.py", line 664, in output_handler
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708]     outputs = await engine_core.get_output_async()
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708]   File "/workspace/vllm/vllm/v1/engine/core_client.py", line 1004, in get_output_async
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708]     raise self._format_exception(outputs) from None
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=564) Traceback (most recent call last):
(APIServer pid=1) INFO 03-04 01:49:51 [async_llm.py:605] Request chatcmpl-b61864175ca7091d failed (engine dead).
(EngineCore_DP0 pid=564)   File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=564)     self.run()
(EngineCore_DP0 pid=564)   File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=564)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=564)   File "/workspace/vllm/vllm/v1/engine/core.py", line 1104, in run_engine_core
(EngineCore_DP0 pid=564)     raise e
(EngineCore_DP0 pid=564)   File "/workspace/vllm/vllm/v1/engine/core.py", line 1093, in run_engine_core
(EngineCore_DP0 pid=564)     engine_core.run_busy_loop()
(EngineCore_DP0 pid=564)   File "/workspace/vllm/vllm/v1/engine/core.py", line 1128, in run_busy_loop
(EngineCore_DP0 pid=564)     self._process_engine_step()
(EngineCore_DP0 pid=564)   File "/workspace/vllm/vllm/v1/engine/core.py", line 1165, in _process_engine_step
(EngineCore_DP0 pid=564)     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=564)                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564)   File "/workspace/vllm/vllm/v1/engine/core.py", line 507, in step_with_batch_queue
(EngineCore_DP0 pid=564)     engine_core_outputs = self.scheduler.update_from_output(
(EngineCore_DP0 pid=564)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564)   File "/workspace/vllm/vllm/v1/core/sched/scheduler.py", line 1316, in update_from_output
(EngineCore_DP0 pid=564)     req_index = model_runner_output.req_id_to_index[req_id]
(EngineCore_DP0 pid=564)                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
(EngineCore_DP0 pid=564) KeyError: 'chatcmpl-b61864175ca7091d-a87942ae'
(APIServer pid=1) INFO:     2409:40f4:40d6:c21e:acdc:1d47:3335:e212:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO:     2409:40f4:40d6:c21e:acdc:1d47:3335:e212:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO:     2409:40f4:40d6:c21e:acdc:1d47:3335:e212:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO:     Shutting down
(APIServer pid=1) INFO:     Waiting for application shutdown.
(APIServer pid=1) INFO:     Application shutdown complete.
(APIServer pid=1) INFO:     Finished server process [1]

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues and checked the [documentation page](https://github.com/vllm-project/tpu-inference/tree/main/docs), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: TPU Inference issue - Qwen3-VL-8B #1850

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: TPU Inference issue - Qwen3-VL-8B #1850

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions