-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Got a runtime error when trying deploy a Llama Single Node task with PD Disaggregation from doc(https://docs.sglang.ai/advanced_features/pd_disaggregation.html), and regular task without PD Disaggregation can run normally.
hardware: 910A3
error log from prefill process:
[2025-09-04 08:38:46] Load weight begin. avail mem=60.86 GB
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:02, 1.27it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:02<00:02, 1.13s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.04it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.04it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.03it/s]
[2025-09-04 08:38:50] Load weight end. type=LlamaForCausalLM, dtype=torch.bfloat16, avail mem=45.60 GB, mem usage=15.26 GB.
[2025-09-04 08:38:50] KV Cache is allocated. #tokens: 292258, K size: 17.84 GB, V size: 17.84 GB
[2025-09-04 08:38:50] Memory pool end. avail mem=8.92 GB
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/utils.py:1039: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
tensor_data = torch.ByteTensor(
[2025-09-04 08:38:51] max_total_num_tokens=292258, chunked_prefill_size=8192, max_prefill_tokens=16384, max_running_requests=2048, context_len=131072, available_gpu_mem=8.90 GB
2025-09-04 08:38:53.009710 info 13949 [ADAPTER pytransfer.cpp:53] Get rpcPort is 16342
2025-09-04 08:38:53.009825 info 13949 [ADAPTER pytransfer.cpp:36] Begin to initialize trans, sessionId: 90.90.97.8:16342 role is: Prefill storeUrl tcp://127.0.0.1:5000 deviceId 0
2025-09-04 08:38:53.16736 info 13949 [HyBM devmm_svm_gva.cpp:268] gva alloc heap. (size=0x40000000 ptr=0x17ffc0000000)
2025-09-04 08:38:53.16975 info 13949 [HyBM hybm_entry.cpp:252] hybm init successfully, library version: 1.0.0, build time: Aug 5 2025 14:59:48, commit: ec516f84d5c117ff6780b9edc6db93d2f4df6c12
2025-09-04 08:38:53.19108 info 14377 [AccLinks acc_tcp_link_delay_cleanup.h:109] AccDelay cleanup thread thread started
2025-09-04 08:38:53.19944 info 14378 [AccLinks acc_tcp_worker.cpp:146] Worker [name AccWrk, index 0, cpu -1, thread-priority 0, poll-timeout-ms 500] progress thread started
2025-09-04 08:38:53.20726 info 14379 [AccLinks acc_tcp_worker.cpp:146] Worker [name AccWrk, index 1, cpu -1, thread-priority 0, poll-timeout-ms 500] progress thread started
2025-09-04 08:38:53.21324 info 13949 [AccLinks acc_tcp_server_default.cpp:475] Trying to connect to 127.0.0.1:5000
2025-09-04 08:38:53.22291 info 13962 [AccLinks acc_tcp_listener.cpp:171] Connected from 127.0.0.1:34516 successfully, ssl disable
2025-09-04 08:38:53.022355 info 13962 [SMEM smem_tcp_config_store_server.cpp:131] new link connected, linkId: 2, rank: 0
2025-09-04 08:38:53.22671 info 13949 [AccLinks acc_tcp_server_default.cpp:630] Connect to 127.0.0.1:5000 successfully, with ssl disable
2025-09-04 08:39:03.028469 info 13949 [SMEM smem_trans_entry.cpp:183] sender side skip register memory.
2025-09-04 08:39:03.028580 info 13949 [SMEM smem_trans_entry.cpp:183] sender side skip register memory.
[2025-09-04 08:39:03] INFO: Started server process [13601]
[2025-09-04 08:39:03] INFO: Waiting for application startup.
[2025-09-04 08:39:03] INFO: Application startup complete.
[2025-09-04 08:39:03] INFO: Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
[2025-09-04 08:39:04] INFO: 127.0.0.1:34112 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-09-04 08:39:04] Start of pd disaggregation warmup ...
[2025-09-04 08:39:04] Prefill batch. #new-seq: 1, #new-token: 4, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 0.00,
/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/ge_concrete_graph/fx2ge_converter.py:997: UserWarning: When enable frozen_parameter, Parameters will be considered frozen.Please make sure that the Parameters data address remain the same throughout the program runtime.
warnings.warn(f'When enable frozen_parameter, Parameters will be considered frozen.'
/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/forward_batch_info.py:996: UserWarning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:335.)
extend_start_loc = torch.zeros_like(extend_seq_lens)
[2025-09-04 08:39:06] TpModelWorkerClient hit an exception: Traceback (most recent call last):
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 141, in forward_thread_func
self.forward_thread_func()
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 176, in forward_thread_func
self.worker.forward_batch_generation(
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 239, in forward_batch_generation
logits_output, can_run_cuda_graph = self.model_runner.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 1750, in forward
output = self._forward_raw(
^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 1795, in _forward_raw
ret = self.forward_extend(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 1695, in forward_extend
return self.model.forward(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 465, in forward
hidden_states = self.model(
^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 342, in forward
hidden_states, residual = layer(
^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 266, in forward
hidden_states = self.self_attn(
^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 197, in forward
attn_output = self.attn(q, k, v, forward_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/radix_attention.py", line 108, in forward
return forward_batch.attn_backend.forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/attention/base_attn_backend.py", line 81, in forward
return self.forward_extend(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/attention/ascend_backend.py", line 174, in forward_extend
forward_batch.token_to_kv_pool.set_kv_buffer(
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/mem_cache/memory_pool.py", line 630, in set_kv_buffer
torch_npu._npu_reshape_and_cache(
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/op_plugin/atb/_atb_ops.py", line 93, in wrapper
return api_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/op_plugin/atb/_atb_ops.py", line 101, in generated_function
return getattr(torch.ops.atb, api_name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1123, in call
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: ReshapeAndCacheOperation setup failed!
[2025-09-04 08:39:06] Received sigquit from a child process. It usually means the child failed.
2025-09-04 08:39:07.40803 info 13960 [AccLinks acc_tcp_link_complex_default.h:360] Link 2 receive header failed, reset by peer, errno 0
2025-09-04 08:39:07.040930 info 13960 [SMEM smem_tcp_config_store_server.cpp:137] link broken, linkId: 2