how to use benchmark_app test qwen2.5-7B-instruct model inference performance? #29907

Answered by ilya-lavrenov

Light-Travlling asked this question in Q&A

Light-Travlling
Apr 3, 2025

i have deployed "qwen2.5-7B-instruct" model on my openvino environment，and successfully！Now i want to test the inference performance，can i use “benchmark_app” to do it?
Using command "benchmark_app -m ./Qwen2.5-7B-Instruct-fp16-ov/openvino_model.xml -hint latency", it wrong like following picture. i need help

Answered by ilya-lavrenov

For multi request, you need to use continuous matching to maximize tput https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/continuous_batching/benchmark

View full answer

Replies: 2 comments 2 replies

ilya-lavrenov
Apr 3, 2025
Maintainer

Please, use a dedicated benchmark for LLMs from GenAI repo
https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/text_generation/benchmark_genai.cpp

1 reply

Light-Travlling Apr 3, 2025
Author

thank you very much, however i don't find mutil requset result, it seems single batch. Where to find i want

ilya-lavrenov
Apr 3, 2025
Maintainer

For multi request, you need to use continuous matching to maximize tput https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/continuous_batching/benchmark

1 reply

Light-Travlling Apr 10, 2025
Author

thank you very much

Answer selected by Light-Travlling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment