diff --git a/Popular_Models_Guide/Llama2/trtllm_guide.md b/Popular_Models_Guide/Llama2/trtllm_guide.md index 6179c1cd..43f87700 100644 --- a/Popular_Models_Guide/Llama2/trtllm_guide.md +++ b/Popular_Models_Guide/Llama2/trtllm_guide.md @@ -345,6 +345,7 @@ You can read more about Gen-AI Perf [here](https://docs.nvidia.com/deeplearning/ To use Gen-AI Perf, run the following command in the same Triton docker container: ```bash genai-perf \ + profile \ -m ensemble \ --service-kind triton \ --backend tensorrtllm \ @@ -380,4 +381,4 @@ Request throughput (per sec): 0.61 ## References -For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md) \ No newline at end of file +For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md)