Skip to content

[Bug]: DeepSeek-v3.2-EXP Crash after first request. #4406

@zkryakgul

Description

@zkryakgul

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.7.1+cpu
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version: Could not collect
CMake version: version 4.2.0
Libc version: glibc-2.35

Python version: 3.11.13 (main, Nov 20 2025, 16:02:27) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-91-generic-aarch64-with-glibc2.35

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             192
On-line CPU(s) list:                0-191
Vendor ID:                          HiSilicon
Model name:                         Kunpeng-920
Model:                              0
Thread(s) per core:                 1
Core(s) per cluster:                48
Socket(s):                          -
Cluster(s):                         4
Stepping:                           0x1
BogoMIPS:                           200.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                          12 MiB (192 instances)
L1i cache:                          12 MiB (192 instances)
L2 cache:                           96 MiB (192 instances)
L3 cache:                           192 MiB (8 instances)
NUMA node(s):                       8
NUMA node0 CPU(s):                  0-23
NUMA node1 CPU(s):                  24-47
NUMA node2 CPU(s):                  48-71
NUMA node3 CPU(s):                  72-95
NUMA node4 CPU(s):                  96-119
NUMA node5 CPU(s):                  120-143
NUMA node6 CPU(s):                  144-167
NUMA node7 CPU(s):                  168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.1.0
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1
[pip3] torchvision==0.22.1
[pip3] transformers==4.57.1
[conda] Could not collect
vLLM Version: 0.11.0
vLLM Ascend Version: 0.11.0rc2

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ASCEND_CUSTOM_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize:
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/op_api/lib/:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 25.2.0                   Version: 25.2.0                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B3               | OK            | 90.9        32                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3406 / 65536         |
+===========================+===============+====================================================+
| 1     910B3               | OK            | 88.7        33                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3406 / 65536         |
+===========================+===============+====================================================+
| 2     910B3               | OK            | 89.5        32                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3406 / 65536         |
+===========================+===============+====================================================+
| 3     910B3               | OK            | 90.5        34                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3405 / 65536         |
+===========================+===============+====================================================+
| 4     910B3               | OK            | 86.7        36                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3405 / 65536         |
+===========================+===============+====================================================+
| 5     910B3               | OK            | 89.4        39                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3406 / 65536         |
+===========================+===============+====================================================+
| 6     910B3               | OK            | 92.9        36                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3406 / 65536         |
+===========================+===============+====================================================+
| 7     910B3               | OK            | 89.9        36                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3406 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.3.RC2
innerversion=V100R001C23SPC002B210
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21],[V100R001C23]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.3.RC2/aarch64-linux

🐛 Describe the bug

I'm trying to run DeepSeek-V3.2-EXP-W8A8 on 2 x A2 Nodes. I obtained the model weight from the ModelScope and started models exactly describe in the official tutorial. Model is successfully starts than I can send the following request and it answers as expected:

curl http://192.168.0.215:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek_v3.2",
        "prompt": "The future of AI is",
        "max_tokens": 64,
        "temperature": 0
    }'

But when I increase the max_tokens count to 1024 or 2048 deployment crashes. I'm only sending 1 request but it crashes with fallowing error. I tried all of the image versions including v0.11.0rc0, v0.11.0rc1, v0.11.0rc2 but result is same. I know it's kinda duplicate of #3717 but the diffrences between my problem and #3717 is I can't send even 1 request.

deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:08 [loggers.py:127] Engine 000: Avg prompt throughput: 0.6 tokens/s, Avg generation throughput: 4.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:08 [loggers.py:127] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:18 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:18 [loggers.py:127] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:28 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.6%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:38 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:48 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:48:58 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (APIServer pid=145) INFO 11-24 07:49:08 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) INFO 11-24 07:49:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) INFO 11-24 07:50:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) INFO 11-24 07:51:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) INFO 11-24 07:52:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.11.0) with config: model='/data/models/DeepSeek-V3.2-Exp-w8a8', speculative_config=None, tokenizer='/data/models/DeepSeek-V3.2-Exp-w8a8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=17450, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=2, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=bfloat16, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1024, served_model_name=deepseek_v3.2, enable_prefix_caching=False, chunked_prefill_enabled=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":32,"local_cache_dir":null}, 
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[], scheduled_cached_reqs=CachedRequestData(req_ids=['cmpl-af7ebc3bb6d14deead811607b6bee5d3-0'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[null], num_computed_tokens=[670]), num_scheduled_tokens={cmpl-af7ebc3bb6d14deead811607b6bee5d3-0: 1}, total_num_scheduled_tokens=1, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=[], free_encoder_mm_hashes=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.012820512820512775, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, kv_connector_stats=None, num_corrupted_reqs=0)
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] EngineCore encountered a fatal error.
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] Traceback (most recent call last):
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     result = get_response(w, dequeue_timeout,
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 244, in get_response
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     status, result = w.worker_response_mq.dequeue(
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 511, in dequeue
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     with self.acquire_read(timeout, cancel, indefinite) as buf:
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 137, in __enter__
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     return next(self.gen)
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]            ^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 460, in acquire_read
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     raise TimeoutError
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] TimeoutError
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] 
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] The above exception was the direct cause of the following exception:
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] 
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] Traceback (most recent call last):
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 701, in run_engine_core
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     engine_core.run_busy_loop()
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 1045, in run_busy_loop
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     executed = self._process_engine_step()
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 754, in _process_engine_step
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     outputs, model_executed = self.step_fn()
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]                               ^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 284, in step
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     model_output = self.execute_model_with_error_logging(
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 270, in execute_model_with_error_logging
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     raise err
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 261, in execute_model_with_error_logging
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     return model_fn(scheduler_output)
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 181, in execute_model
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     (output, ) = self.collective_rpc(
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]                  ^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 273, in collective_rpc
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]     raise TimeoutError(f"RPC call to {method} timed out.") from e
deepseek-v3_2-exp  | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] TimeoutError: RPC call to execute_model timed out.
deepseek-v3_2-exp  | (Worker_DP0_TP0_EP0 pid=687) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (Worker_DP0_TP1_EP1 pid=817) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] AsyncLLM output_handler failed.
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] Traceback (most recent call last):
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 439, in output_handler
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480]     outputs = await engine_core.get_output_async()
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480]   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 846, in get_output_async
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480]     raise self._format_exception(outputs) from None
deepseek-v3_2-exp  | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
deepseek-v3_2-exp  | (Worker_DP0_TP2_EP2 pid=966) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     192.168.0.215:60216 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
deepseek-v3_2-exp  | (Worker_DP0_TP3_EP3 pid=1117) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (Worker_DP0_TP4_EP4 pid=1268) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (Worker_DP0_TP5_EP5 pid=1419) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (Worker_DP0_TP6_EP6 pid=1570) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (Worker_DP0_TP7_EP7 pid=1721) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     192.168.0.215:35496 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     192.168.0.215:35500 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     Shutting down
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     Waiting for application shutdown.
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     Application shutdown complete.
deepseek-v3_2-exp  | (APIServer pid=145) INFO:     Finished server process [145]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions