双卡V100在vllm下运行本地模型参考 #173
Leechy-Litchi
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
如下为启动参数(swap和cache设置为0是因为买不起内存条但是显存够大),vllm==0.10.1,占用情况为双卡各13G多一点(vllm可以设置cpu offload,按理来说可以卸载一些到内存上),速度方面比较慢,应该在3~5秒左右操作一次。
NCCL_P2P_LEVEL=NV6 vllm serve ~/models/AutoGLM-Phone-9B/ --host 0.0.0.0 --port 11451 --max_model_len 25480 -tp 2 --gpu_memory_utilization 0.85 --swap-space 0 --enforce-eager --mm-processor-cache-gb 0 --no-enable-chunked-prefill --served-model-name autoglm-phone-9b --chat-template-content-format string
测试平台(小众机型,不一定有普适性):AGM G1pro,系统近似原生安卓
遇到应该是定位精度问题,淘宝购物,结果不断点到百亿补贴,看输出应该是正确识别了框但是位置错了。
Beta Was this translation helpful? Give feedback.
All reactions