-
Notifications
You must be signed in to change notification settings - Fork 597
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.7.1+cpu
Is debug build: False
OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version: Could not collect
CMake version: version 4.2.0
Libc version: glibc-2.35
Python version: 3.11.13 (main, Nov 20 2025, 16:02:27) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-91-generic-aarch64-with-glibc2.35
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
Model name: Kunpeng-920
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 48
Socket(s): -
Cluster(s): 4
Stepping: 0x1
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.1.0
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1
[pip3] torchvision==0.22.1
[pip3] transformers==4.57.1
[conda] Could not collect
vLLM Version: 0.11.0
vLLM Ascend Version: 0.11.0rc2
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ASCEND_CUSTOM_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize:
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/op_api/lib/:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 25.2.0 Version: 25.2.0 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B3 | OK | 90.9 32 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 3406 / 65536 |
+===========================+===============+====================================================+
| 1 910B3 | OK | 88.7 33 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 3406 / 65536 |
+===========================+===============+====================================================+
| 2 910B3 | OK | 89.5 32 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3406 / 65536 |
+===========================+===============+====================================================+
| 3 910B3 | OK | 90.5 34 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 3405 / 65536 |
+===========================+===============+====================================================+
| 4 910B3 | OK | 86.7 36 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 3405 / 65536 |
+===========================+===============+====================================================+
| 5 910B3 | OK | 89.4 39 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 3406 / 65536 |
+===========================+===============+====================================================+
| 6 910B3 | OK | 92.9 36 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 3406 / 65536 |
+===========================+===============+====================================================+
| 7 910B3 | OK | 89.9 36 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 3406 / 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| No running processes found in NPU 4 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+
CANN:
package_name=Ascend-cann-toolkit
version=8.3.RC2
innerversion=V100R001C23SPC002B210
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21],[V100R001C23]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.3.RC2/aarch64-linux
🐛 Describe the bug
I'm trying to run DeepSeek-V3.2-EXP-W8A8 on 2 x A2 Nodes. I obtained the model weight from the ModelScope and started models exactly describe in the official tutorial. Model is successfully starts than I can send the following request and it answers as expected:
curl http://192.168.0.215:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek_v3.2",
"prompt": "The future of AI is",
"max_tokens": 64,
"temperature": 0
}'
But when I increase the max_tokens count to 1024 or 2048 deployment crashes. I'm only sending 1 request but it crashes with fallowing error. I tried all of the image versions including v0.11.0rc0, v0.11.0rc1, v0.11.0rc2 but result is same. I know it's kinda duplicate of #3717 but the diffrences between my problem and #3717 is I can't send even 1 request.
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:08 [loggers.py:127] Engine 000: Avg prompt throughput: 0.6 tokens/s, Avg generation throughput: 4.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:08 [loggers.py:127] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:18 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:18 [loggers.py:127] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:28 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.6%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:38 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:48 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:48:58 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (APIServer pid=145) INFO 11-24 07:49:08 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 0.0%
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) INFO 11-24 07:49:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) INFO 11-24 07:50:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) INFO 11-24 07:51:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) INFO 11-24 07:52:49 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.11.0) with config: model='/data/models/DeepSeek-V3.2-Exp-w8a8', speculative_config=None, tokenizer='/data/models/DeepSeek-V3.2-Exp-w8a8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=17450, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=2, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=bfloat16, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1024, served_model_name=deepseek_v3.2, enable_prefix_caching=False, chunked_prefill_enabled=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":32,"local_cache_dir":null},
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[], scheduled_cached_reqs=CachedRequestData(req_ids=['cmpl-af7ebc3bb6d14deead811607b6bee5d3-0'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[null], num_computed_tokens=[670]), num_scheduled_tokens={cmpl-af7ebc3bb6d14deead811607b6bee5d3-0: 1}, total_num_scheduled_tokens=1, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=[], free_encoder_mm_hashes=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.012820512820512775, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, kv_connector_stats=None, num_corrupted_reqs=0)
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] EngineCore encountered a fatal error.
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] Traceback (most recent call last):
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] result = get_response(w, dequeue_timeout,
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 244, in get_response
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] status, result = w.worker_response_mq.dequeue(
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 511, in dequeue
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] with self.acquire_read(timeout, cancel, indefinite) as buf:
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 137, in __enter__
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] return next(self.gen)
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 460, in acquire_read
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] raise TimeoutError
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] TimeoutError
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] The above exception was the direct cause of the following exception:
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710]
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] Traceback (most recent call last):
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 701, in run_engine_core
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] engine_core.run_busy_loop()
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 1045, in run_busy_loop
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] executed = self._process_engine_step()
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 754, in _process_engine_step
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] outputs, model_executed = self.step_fn()
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 284, in step
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] model_output = self.execute_model_with_error_logging(
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 270, in execute_model_with_error_logging
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] raise err
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 261, in execute_model_with_error_logging
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] return model_fn(scheduler_output)
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 181, in execute_model
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] (output, ) = self.collective_rpc(
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] ^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 273, in collective_rpc
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] raise TimeoutError(f"RPC call to {method} timed out.") from e
deepseek-v3_2-exp | (EngineCore_DP0 pid=414) ERROR 11-24 07:53:49 [core.py:710] TimeoutError: RPC call to execute_model timed out.
deepseek-v3_2-exp | (Worker_DP0_TP0_EP0 pid=687) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (Worker_DP0_TP1_EP1 pid=817) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] AsyncLLM output_handler failed.
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] Traceback (most recent call last):
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 439, in output_handler
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] outputs = await engine_core.get_output_async()
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 846, in get_output_async
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] raise self._format_exception(outputs) from None
deepseek-v3_2-exp | (APIServer pid=145) ERROR 11-24 07:53:49 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
deepseek-v3_2-exp | (Worker_DP0_TP2_EP2 pid=966) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (APIServer pid=145) INFO: 192.168.0.215:60216 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
deepseek-v3_2-exp | (Worker_DP0_TP3_EP3 pid=1117) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (Worker_DP0_TP4_EP4 pid=1268) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (Worker_DP0_TP5_EP5 pid=1419) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (Worker_DP0_TP6_EP6 pid=1570) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (Worker_DP0_TP7_EP7 pid=1721) INFO 11-24 07:53:49 [multiproc_executor.py:558] Parent process exited, terminating worker
deepseek-v3_2-exp | (APIServer pid=145) INFO: 192.168.0.215:35496 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
deepseek-v3_2-exp | (APIServer pid=145) INFO: 192.168.0.215:35500 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
deepseek-v3_2-exp | (APIServer pid=145) INFO: Shutting down
deepseek-v3_2-exp | (APIServer pid=145) INFO: Waiting for application shutdown.
deepseek-v3_2-exp | (APIServer pid=145) INFO: Application shutdown complete.
deepseek-v3_2-exp | (APIServer pid=145) INFO: Finished server process [145]
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working