双卡V100在vllm下运行本地模型参考 #173

Leechy-Litchi · 2025-12-13T14:13:40Z

Leechy-Litchi
Dec 13, 2025

如下为启动参数（swap和cache设置为0是因为买不起内存条但是显存够大），vllm==0.10.1，占用情况为双卡各13G多一点（vllm可以设置cpu offload，按理来说可以卸载一些到内存上），速度方面比较慢，应该在3~5秒左右操作一次。
NCCL_P2P_LEVEL=NV6 vllm serve ~/models/AutoGLM-Phone-9B/ --host 0.0.0.0 --port 11451 --max_model_len 25480 -tp 2 --gpu_memory_utilization 0.85 --swap-space 0 --enforce-eager --mm-processor-cache-gb 0 --no-enable-chunked-prefill --served-model-name autoglm-phone-9b --chat-template-content-format string
测试平台（小众机型，不一定有普适性）：AGM G1pro，系统近似原生安卓
遇到应该是定位精度问题，淘宝购物，结果不断点到百亿补贴，看输出应该是正确识别了框但是位置错了。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

双卡V100在vllm下运行本地模型参考 #173

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

双卡V100在vllm下运行本地模型参考 #173

Uh oh!

Leechy-Litchi Dec 13, 2025

Replies: 0 comments

Leechy-Litchi
Dec 13, 2025