From 6a243260b84f3b3aef12be71a187c4f34bcfb97c Mon Sep 17 00:00:00 2001 From: Matthew Kotila Date: Wed, 4 Sep 2024 13:17:50 -0700 Subject: [PATCH] Update trtllm_guide.md --- Popular_Models_Guide/Llama2/trtllm_guide.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Popular_Models_Guide/Llama2/trtllm_guide.md b/Popular_Models_Guide/Llama2/trtllm_guide.md index 6179c1cd..43f87700 100644 --- a/Popular_Models_Guide/Llama2/trtllm_guide.md +++ b/Popular_Models_Guide/Llama2/trtllm_guide.md @@ -345,6 +345,7 @@ You can read more about Gen-AI Perf [here](https://docs.nvidia.com/deeplearning/ To use Gen-AI Perf, run the following command in the same Triton docker container: ```bash genai-perf \ + profile \ -m ensemble \ --service-kind triton \ --backend tensorrtllm \ @@ -380,4 +381,4 @@ Request throughput (per sec): 0.61 ## References -For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md) \ No newline at end of file +For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md)