[Bug]: EngineCore crashes testing v1/embeddings

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.3 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version                : Could not collect
CMake version                : version 3.28.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.1+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.3 (main, Nov  6 2025, 13:44:16) [GCC 13.3.0] (64-bit runtime)
Python platform              : Linux-6.14.0-1018-aws-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           46 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  4
On-line CPU(s) list:                     0-3
Vendor ID:                               GenuineIntel
Model name:                              Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
CPU family:                              6
Model:                                   85
Thread(s) per core:                      2
Core(s) per socket:                      2
Socket(s):                               1
Stepping:                                7
BogoMIPS:                                4999.99
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
Hypervisor vendor:                       KVM
Virtualization type:                     full
L1d cache:                               64 KiB (2 instances)
L1i cache:                               64 KiB (2 instances)
L2 cache:                                2 MiB (2 instances)
L3 cache:                                35.8 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-3
Vulnerability Gather data sampling:      Unknown: Dependent on hypervisor status
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Mitigation; Aligned branch/return thunks
Vulnerability Itlb multihit:             KVM: Mitigation: VMX unsupported
Vulnerability L1tf:                      Mitigation; PTE Inversion
Vulnerability Mds:                       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:                  Mitigation; PTI
Vulnerability Mmio stale data:           Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Vulnerable
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Vulnerable
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.1+cpu
[pip3] torchaudio==2.9.1
[pip3] torchvision==0.24.1
[pip3] transformers==4.57.3
[pip3] triton==3.5.1
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.13.0rc2.dev139+g9ccbf6b69 (git sha: 9ccbf6b69)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

```

</details>




### 🐛 Describe the bug

EngineCore crashes

```bash
curl -X POST http://localhost:8000/v1/embeddings -H "Content-Type: application/json" -d '{
    "model": "Qwen/Qwen3-Embedding-0.6B",
    "input": "The quick brown fox jumps over the lazy dog."
}'
{"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"BadRequestError","param":null,"code":400}
```


```
 docker run  -v ~/.cache/huggingface:/root/.cache/huggingface  --env "HF_TOKEN=$HF_TOKEN"   -p 8000:8000     --ipc=host public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.12.0 --model Qwen/Qwen3-Embedding-0.6B

public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo   v0.12.0       e0f05ebe3257   11 days ago    3.28GB
```


```

{"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"BadRequestError","param":null,"code":400}}
```


```
es_config': {'type': <DynamicShapesType.BACKED: 'backed'>}, 'local_cache_dir': None}
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [cpu_worker.py:192] auto thread-binding list (id, physical core): [(2, 0), (3, 1)]
get_mempolicy: Operation not permitted
[W1214 20:43:07.729406999 utils.cpp:82] Warning: numa_migrate_pages failed. errno: 1 (function init_cpu_threads_env)
set_mempolicy: Operation not permitted
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [cpu_worker.py:98] OMP threads binding of Process 25:
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [cpu_worker.py:98]  OMP tid: 25, core 2
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [cpu_worker.py:98]  OMP tid: 35, core 3
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [cpu_worker.py:98]
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [parallel_state.py:1200] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.17.0.2:57543 backend=gloo
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [parallel_state.py:1408] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=25) INFO 12-14 20:43:07 [cpu_model_runner.py:55] Starting to load model Qwen/Qwen3-Embedding-0.6B...
(EngineCore_DP0 pid=25) INFO 12-14 20:43:08 [weight_utils.py:527] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:04<00:00,  4.34s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:04<00:00,  4.34s/it]
(EngineCore_DP0 pid=25)
(EngineCore_DP0 pid=25) INFO 12-14 20:43:12 [default_loader.py:308] Loading weights took 4.39 seconds
(EngineCore_DP0 pid=25) INFO 12-14 20:43:13 [kv_cache_utils.py:1286] GPU KV cache size: 37,376 tokens
(EngineCore_DP0 pid=25) INFO 12-14 20:43:13 [kv_cache_utils.py:1291] Maximum concurrency for 32,768 tokens per request: 1.14x
(EngineCore_DP0 pid=25) INFO 12-14 20:43:15 [cpu_model_runner.py:65] Warming up model for the compilation...
^N(EngineCore_DP0 pid=25) INFO 12-14 20:44:56 [cpu_model_runner.py:75] Warming up done.
(EngineCore_DP0 pid=25) INFO 12-14 20:44:56 [core.py:254] init engine (profile, create kv cache, warmup model) took 103.52 seconds
(EngineCore_DP0 pid=25) WARNING 12-14 20:44:59 [vllm.py:608] Inductor compilation was disabled by user settings,Optimizations settings that are only active duringInductor compilation will be ignored.
(EngineCore_DP0 pid=25) WARNING 12-14 20:44:59 [cpu.py:152] Environment variable VLLM_CPU_KVCACHE_SPACE (GiB) for CPU backend is not set, using 4 by default.
(APIServer pid=1) INFO 12-14 20:44:59 [api_server.py:1520] Supported tasks: ['embed']
(APIServer pid=1) INFO 12-14 20:44:59 [api_server.py:1847] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:38] Available routes are:
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 12-14 20:44:59 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(APIServer pid=1) ERROR 12-14 20:45:35 [core_client.py:600] Engine core proc EngineCore_DP0 died unexpectedly, shutting down client.
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546] AsyncLLM output_handler failed.
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546] Traceback (most recent call last):
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 498, in output_handler
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546]     outputs = await engine_core.get_output_async()
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 885, in get_output_async
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546]     raise self._format_exception(outputs) from None
(APIServer pid=1) ERROR 12-14 20:45:35 [async_llm.py:546] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=1) INFO:     172.17.0.1:42640 - "POST /v1/embeddings HTTP/1.1" 400 Bad Request
(APIServer pid=1) INFO:     Shutting down
(APIServer pid=1) INFO:     Waiting for application shutdown.
(APIServer pid=1) INFO:     Application shutdown complete.
(APIServer pid=1) INFO:     Finished server process [1]
/opt/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d 
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: EngineCore crashes testing v1/embeddings #30656

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: EngineCore crashes testing v1/embeddings #30656

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions