-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Description
Your current environment
(vllm) [zhizhehui@localhost ~]$ python collect_env.py
Collecting environment information...
System Info
==============================
OS : Kylin Linux Advanced Server V10 (Halberd) (x86_64)
GCC version : (GCC) 7.3.0
Clang version : Could not collect
CMake version : version 3.16.5
Libc version : glibc-2.28
==============================
PyTorch Info
PyTorch version : 2.9.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
Python version : 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:16:04) [GCC 11.2.0] (64-bit runtime)
Python platform : Linux-4.19.90-89.11.v2401.ky10.x86_64-x86_64-with-glibc2.28
==============================
CUDA / GPU Info
Is CUDA available : True
CUDA runtime version : 12.8.61
CUDA_MODULE_LOADING set to :
GPU models and configuration :
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000
GPU 2: NVIDIA RTX A6000
GPU 3: NVIDIA RTX A6000
Nvidia driver version : 580.105.08
cuDNN version : Probably one of the following:
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn.so.8.9.7
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
架构: x86_64
CPU 运行模式: 32-bit, 64-bit
字节序: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU: 128
在线 CPU 列表: 0-127
每个核的线程数: 2
每个座的核数: 32
座: 2
NUMA 节点: 8
厂商 ID: HygonGenuine
CPU 系列: 24
型号: 0
型号名称: Hygon C86 7285 32-core Processor
步进: 2
CPU MHz: 2000.105
BogoMIPS: 4000.21
虚拟化: AMD-V
L1d 缓存: 2 MiB
L1i 缓存: 4 MiB
L2 缓存: 32 MiB
L3 缓存: 128 MiB
NUMA 节点0 CPU: 0-7,64-71
NUMA 节点1 CPU: 8-15,72-79
NUMA 节点2 CPU: 16-23,80-87
NUMA 节点3 CPU: 24-31,88-95
NUMA 节点4 CPU: 32-39,96-103
NUMA 节点5 CPU: 40-47,104-111
NUMA 节点6 CPU: 48-55,112-119
NUMA 节点7 CPU: 56-63,120-127
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
标记: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
==============================
Versions of relevant libraries
[pip3] flashinfer-python==0.5.2
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.2
[pip3] nvidia-ml-py==13.590.44
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.57.3
[pip3] triton==3.5.0
[conda] flashinfer-python 0.5.2 pypi_0 pypi
[conda] numpy 2.2.6 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.8.4.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.8.90 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.8.93 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.8.90 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.10.2.21 pypi_0 pypi
[conda] nvidia-cudnn-frontend 1.16.0 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.3.3.83 pypi_0 pypi
[conda] nvidia-cufile-cu12 1.13.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.9.90 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.7.3.90 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.5.8.93 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.7.1 pypi_0 pypi
[conda] nvidia-cutlass-dsl 4.3.2 pypi_0 pypi
[conda] nvidia-ml-py 13.590.44 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.27.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.8.93 pypi_0 pypi
[conda] nvidia-nvshmem-cu12 3.3.20 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.8.90 pypi_0 pypi
[conda] pyzmq 27.1.0 pypi_0 pypi
[conda] torch 2.9.0 pypi_0 pypi
[conda] torchaudio 2.9.0 pypi_0 pypi
[conda] torchvision 0.24.0 pypi_0 pypi
[conda] transformers 4.57.3 pypi_0 pypi
[conda] triton 3.5.0 pypi_0 pypi
==============================
vLLM Info
ROCM Version : Could not collect
vLLM Version : 0.11.2
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX SYS SYS 40-47,104-111 5 N/A
GPU1 PIX X SYS SYS 40-47,104-111 5 N/A
GPU2 SYS SYS X PIX 48-55,112-119 6 N/A
GPU3 SYS SYS PIX X 48-55,112-119 6 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
LD_LIBRARY_PATH=/data/home/zhizhehui/dev/app/minconda/lib:/usr/local/cuda-12.8/lib64:
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
🐛 Describe the bug
python -m vllm.entrypoints.openai.api_server --model /data/home/zhizhehui/dev/models/Qwen3-VL-8B-Instruct --enable-lora --max-lora-rank 32 --lora-dtype float16 --dtype float16 --tensor-parallel-size 1 --gpu-memory-utilization 0.8 --port 8000 --max-model-len 8888 --host 0.0.0.0 --served-model-name qwen3-vl --lora-modules '["table_qc=/data/home/zhizhehui/dev/train/table_ocr/Reward_model_train/output/qwen_lora_output/checkpoint-57"]'
(EngineCore_DP0 pid=1511501) WARNING 12-15 18:56:44 [utils.py:250] Using default LoRA kernel configs
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|████████████████████████████████████| 102/102 [00:15<00:00, 6.77it/s]
Capturing CUDA graphs (decode, FULL): 100%|█████████████████████████████████████████████████████████| 70/70 [00:10<00:00, 6.93it/s]
(EngineCore_DP0 pid=1511501) INFO 12-15 18:57:09 [gpu_model_runner.py:4244] Graph capturing finished in 26 secs, took 1.34 GiB
(EngineCore_DP0 pid=1511501) INFO 12-15 18:57:09 [core.py:250] init engine (profile, create kv cache, warmup model) took 69.12 seconds
(APIServer pid=1510937) INFO 12-15 18:57:10 [api_server.py:1725] Supported tasks: ['generate']
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] Invocation of add_lora method failed
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] Traceback (most recent call last):
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 102, in _load_adapter
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] peft_helper = PEFTHelper.from_local_dir(
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/lora/peft_helper.py", line 108, in from_local_dir
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] with open(lora_config_path) as f:
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] FileNotFoundError: [Errno 2] No such file or directory: '/data/home/zhizhehui/dev/train/table_ocr/Reward_model_train/output/qwen_lora_output/checkpoint-57"]/adapter_config.json'
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918]
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918]
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] Traceback (most recent call last):
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 915, in _handle_client_request
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] result = method(*self._convert_msgspec_args(method, args))
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in add_lora
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] return self.model_executor.add_lora(lora_request)
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 277, in add_lora
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] return all(self.collective_rpc("add_lora", args=(lora_request,)))
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] return func(*args, **kwargs)
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 612, in add_lora
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] return self.model_runner.add_lora(lora_request)
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 201, in add_lora
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] return self.lora_manager.add_adapter(lora_request)
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 263, in add_adapter
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] lora = self._load_adapter(lora_request)
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 138, in _load_adapter
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] raise ValueError(
(EngineCore_DP0 pid=1511501) ERROR 12-15 18:57:10 [core.py:918] ValueError: Loading lora ["table_qc failed: No adapter found for /data/home/zhizhehui/dev/train/table_ocr/Reward_model_train/output/qwen_lora_output/checkpoint-57"]
[rank0]:[W1215 18:57:10.424580246 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=1510937) Traceback (most recent call last):
(APIServer pid=1510937) File "", line 198, in _run_module_as_main
(APIServer pid=1510937) File "", line 88, in _run_code
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 2096, in
(APIServer pid=1510937) uvloop.run(run_server(args))
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/uvloop/init.py", line 96, in run
(APIServer pid=1510937) return __asyncio.run(
(APIServer pid=1510937) ^^^^^^^^^^^^^^
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1510937) return runner.run(main)
(APIServer pid=1510937) ^^^^^^^^^^^^^^^^
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1510937) return self._loop.run_until_complete(task)
(APIServer pid=1510937) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1510937) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1510937) return await main
(APIServer pid=1510937) ^^^^^^^^^^
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 2024, in run_server
(APIServer pid=1510937) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 2050, in run_server_worker
(APIServer pid=1510937) await init_app_state(engine_client, app.state, args)
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1760, in init_app_state
(APIServer pid=1510937) await state.openai_serving_models.init_static_loras()
(APIServer pid=1510937) File "/data/home/zhizhehui/dev/app/minconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_models.py", line 90, in init_static_loras
(APIServer pid=1510937) raise ValueError(load_result.error.message)
(APIServer pid=1510937) ValueError: Call to add_lora method failed: Loading lora ["table_qc failed: No adapter found for /data/home/zhizhehui/dev/train/table_ocr/Reward_model_train/output/qwen_lora_output/checkpoint-57"]
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.