本地 VLLM 引擎的MinerU怎么控制并发数呀?如何在脚本调用中实现 #4361
Unanswered
BaBaoZhooou
asked this question in
Q&A
Replies: 1 comment
-
|
Hi @BaBaoZhooou! I'm Dosu and I’m helping the MinerU team. 本地 VLLM 引擎的 MinerU 并发控制方法如下:
CUDA_VISIBLE_DEVICES=0 mineru-openai-server --engine vllm --port 30000 --max-concurrency 20 --gpu-memory-utilization 0.5或 from mineru import MinerUClient
client = MinerUClient(backend="vllm-engine", model_path="your_model_path", max_concurrency=20, gpu_memory_utilization=0.5)参数说明和建议:
常见命令示例: CUDA_VISIBLE_DEVICES=0 mineru-openai-server --engine vllm --max-concurrency 20 --gpu-memory-utilization 0.5多卡并发: CUDA_VISIBLE_DEVICES=0,1 mineru-openai-server --engine vllm --data-parallel-size 2 --max-concurrency 40 --gpu-memory-utilization 0.5API 层并发限制: export MINERU_API_MAX_CONCURRENT_REQUESTS=10
mineru-api --host 0.0.0.0 --port 8000注意事项:
如需更细致的显存/并发调优,建议结合实际模型和任务场景多次测试,监控 GPU 利用率和系统日志。 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
请问本地 VLLM 引擎的MinerU怎么控制并发数?如何能在脚本调用中实现?并发调用的显存占用大约是多少呀?🙏 help
Beta Was this translation helpful? Give feedback.
All reactions