Skip to content

Commit ae4a292

Browse files
whoisjnnshah1
andauthored
Update Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Co-authored-by: Neelay Shah <[email protected]>
1 parent 70d533a commit ae4a292

File tree

1 file changed

+1
-1
lines changed
  • Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing

1 file changed

+1
-1
lines changed

Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
# Autoscaling and Load Balancing Generative AI w/ Triton Server and TensorRT-LLM
1818

19-
Setting up autoscaling and load balancing using Triton Inference Server, TensorRT-LLM or vLLM, and Kubernetes is not difficult,
19+
Setting up autoscaling and load balancing for large language models served by Triton Inference Server is not difficult,
2020
but it does require preparation.
2121

2222
This guide aims to help you automated acquisition of models from Hugging Face, minimize time spent optimizing models for

0 commit comments

Comments
 (0)