From 6a243260b84f3b3aef12be71a187c4f34bcfb97c Mon Sep 17 00:00:00 2001
From: Matthew Kotila <matthew.r.kotila@gmail.com>
Date: Wed, 4 Sep 2024 13:17:50 -0700
Subject: [PATCH] Update trtllm_guide.md

---
 Popular_Models_Guide/Llama2/trtllm_guide.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Popular_Models_Guide/Llama2/trtllm_guide.md b/Popular_Models_Guide/Llama2/trtllm_guide.md
index 6179c1cd..43f87700 100644
--- a/Popular_Models_Guide/Llama2/trtllm_guide.md
+++ b/Popular_Models_Guide/Llama2/trtllm_guide.md
@@ -345,6 +345,7 @@ You can read more about Gen-AI Perf [here](https://docs.nvidia.com/deeplearning/
 To use Gen-AI Perf, run the following command in the same Triton docker container:
 ```bash
 genai-perf \
+  profile \
   -m ensemble \
   --service-kind triton \
   --backend tensorrtllm \
@@ -380,4 +381,4 @@ Request throughput (per sec): 0.61
 
 ## References
 
-For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md)
\ No newline at end of file
+For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md)