Update Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md

whoisj · nnshah1 · web-flow · commit ae4a2921b084 · 2024-06-12T12:08:45.000-04:00
Co-authored-by: Neelay Shah &lt;neelays@nvidia.com&gt;
diff --git a/Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md b/Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
@@ -16,7 +16,7 @@
 
 # Autoscaling and Load Balancing Generative AI w/ Triton Server and TensorRT-LLM
 
-Setting up autoscaling and load balancing using Triton Inference Server, TensorRT-LLM or vLLM, and Kubernetes is not difficult,
+Setting up autoscaling and load balancing for large language models served by Triton Inference Server is not difficult,
 but it does require preparation.
 
 This guide aims to help you automated acquisition of models from Hugging Face, minimize time spent optimizing models for