Skip to content

[Bug]: TPU Inference issue - Qwen3-VL-8B #1850

@Dineshkumar-Anandan-ZS0367

Description

Your current environment

TPU Info:
TPU info: node_name=v6e-test | tpu_type=v6e-4 | worker_id=0 | num_chips=4 | num_cores_per_chip=1

Command to start vLLM server:
docker run -it --rm
--net=host
--privileged
-v /dev:/dev
-v $HOME/.cache/huggingface:/root/.cache/huggingface
vllm/vllm-tpu:nightly-20260303-1f005dc-4034c3d
vllm serve Qwen/Qwen3-VL-8B-Instruct
--tensor-parallel-size 4
--dtype bfloat16
--max-model-len 22528
--max-num-seqs 16
--host 0.0.0.0
--port 7000
--trust-remote-code
--enable-log-requests

🐛 Describe the bug

Issue:

(APIServer pid=1) INFO 03-04 01:49:51 [async_llm.py:421] Added request chatcmpl-b61864175ca7091d-a87942ae.
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] Traceback (most recent call last):
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] File "/workspace/vllm/vllm/v1/engine/core.py", line 1093, in run_engine_core
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] engine_core.run_busy_loop()
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] File "/workspace/vllm/vllm/v1/engine/core.py", line 1128, in run_busy_loop
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] self._process_engine_step()
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] File "/workspace/vllm/vllm/v1/engine/core.py", line 1165, in _process_engine_step
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] File "/workspace/vllm/vllm/v1/engine/core.py", line 507, in step_with_batch_queue
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] engine_core_outputs = self.scheduler.update_from_output(
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] File "/workspace/vllm/vllm/v1/core/sched/scheduler.py", line 1316, in update_from_output
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] req_index = model_runner_output.req_id_to_index[req_id]
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
(EngineCore_DP0 pid=564) ERROR 03-04 01:49:51 [core.py:1102] KeyError: 'chatcmpl-b61864175ca7091d-a87942ae'
(EngineCore_DP0 pid=564) Process EngineCore_DP0:
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] AsyncLLM output_handler failed.
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] Traceback (most recent call last):
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] File "/workspace/vllm/vllm/v1/engine/async_llm.py", line 664, in output_handler
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] outputs = await engine_core.get_output_async()
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] File "/workspace/vllm/vllm/v1/engine/core_client.py", line 1004, in get_output_async
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] raise self._format_exception(outputs) from None
(APIServer pid=1) ERROR 03-04 01:49:51 [async_llm.py:708] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=564) Traceback (most recent call last):
(APIServer pid=1) INFO 03-04 01:49:51 [async_llm.py:605] Request chatcmpl-b61864175ca7091d failed (engine dead).
(EngineCore_DP0 pid=564) File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=564) self.run()
(EngineCore_DP0 pid=564) File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=564) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=564) File "/workspace/vllm/vllm/v1/engine/core.py", line 1104, in run_engine_core
(EngineCore_DP0 pid=564) raise e
(EngineCore_DP0 pid=564) File "/workspace/vllm/vllm/v1/engine/core.py", line 1093, in run_engine_core
(EngineCore_DP0 pid=564) engine_core.run_busy_loop()
(EngineCore_DP0 pid=564) File "/workspace/vllm/vllm/v1/engine/core.py", line 1128, in run_busy_loop
(EngineCore_DP0 pid=564) self._process_engine_step()
(EngineCore_DP0 pid=564) File "/workspace/vllm/vllm/v1/engine/core.py", line 1165, in _process_engine_step
(EngineCore_DP0 pid=564) outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=564) ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564) File "/workspace/vllm/vllm/v1/engine/core.py", line 507, in step_with_batch_queue
(EngineCore_DP0 pid=564) engine_core_outputs = self.scheduler.update_from_output(
(EngineCore_DP0 pid=564) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=564) File "/workspace/vllm/vllm/v1/core/sched/scheduler.py", line 1316, in update_from_output
(EngineCore_DP0 pid=564) req_index = model_runner_output.req_id_to_index[req_id]
(EngineCore_DP0 pid=564) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
(EngineCore_DP0 pid=564) KeyError: 'chatcmpl-b61864175ca7091d-a87942ae'
(APIServer pid=1) INFO: 2409:40f4:40d6:c21e:acdc:1d47:3335:e212:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO: 2409:40f4:40d6:c21e:acdc:1d47:3335:e212:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO: 2409:40f4:40d6:c21e:acdc:1d47:3335:e212:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=1) INFO: Shutting down
(APIServer pid=1) INFO: Waiting for application shutdown.
(APIServer pid=1) INFO: Application shutdown complete.
(APIServer pid=1) INFO: Finished server process [1]

Before submitting a new issue...

  • Make sure you already searched for relevant issues and checked the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions