Skip to content

Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 12, 2024

Conversation

whoisj
Copy link
Contributor

@whoisj whoisj commented May 28, 2024

This change adds a guide for deploying autoscaling & load balancing of TensorRT-LLM Gen. AI models.

Includes:

  • Guidance
  • Helm chart w/ multiple example models value files
  • YAML files necessary for setting up a Kubernetes cluster
  • Build files for required container images
  • Grafana dashboard configuration JSON file

@whoisj whoisj added documentation Improvements or additions to documentation enhancement New feature or request labels May 28, 2024
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch from 7f29e69 to 2f32aa1 Compare May 28, 2024 19:24
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch from 2f32aa1 to f8a1c7d Compare May 28, 2024 19:49
@whoisj whoisj requested review from nnshah1 and nealvaidya May 28, 2024 20:04
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 3 times, most recently from 32213c7 to 8623def Compare May 28, 2024 20:49
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch from 8623def to dc5fdd7 Compare May 28, 2024 20:52
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 3 times, most recently from db38d43 to beddaf9 Compare May 28, 2024 21:11
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch 9 times, most recently from 103087d to e34523d Compare May 29, 2024 21:05
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch from 5d516f3 to 01c0842 Compare June 10, 2024 17:47
nealvaidya
nealvaidya previously approved these changes Jun 10, 2024
This change inlcudes a number of improvements suggested by @nealvaidya.
nealvaidya
nealvaidya previously approved these changes Jun 10, 2024
@whoisj whoisj requested review from mc-nv and removed request for mc-nv June 10, 2024 21:08
nnshah1
nnshah1 previously approved these changes Jun 12, 2024
Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some sentence level suggestions - could use some additional eyes to catch any other grammer / syntax errors - but overall looks great! We can continue refining in future iterations.

@harryskim - would be good to get your quick review.

@nnshah1 nnshah1 requested a review from harryskim June 12, 2024 08:09
@whoisj whoisj dismissed stale reviews from nnshah1 and nealvaidya via ae4a292 June 12, 2024 16:08
This change inlcudes a number of improvements suggested by @nnshah1.

Co-authored-by: Neelay Shah <[email protected]>
@whoisj whoisj force-pushed the jwyman/trtllm-aslb branch from 9a637d2 to dceba28 Compare June 12, 2024 16:23
Copy link
Contributor

@harryskim harryskim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the applied changes requested by Neelay.

@whoisj whoisj merged commit d459ddd into triton-inference-server:main Jun 12, 2024
3 checks passed
fdf3d186-88d5 pushed a commit to fdf3d186-88d5/triton-inference-server that referenced this pull request Mar 21, 2025
…nce-server#95)

* Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide

This change adds a guide for deploying autoscaling & load balancing of TensorRT-LLM Gen. AI models.

Includes:
- Guidance
- Helm chart w/ multiple example models value files
- YAML files necessary for setting up a Kubernetes cluster
- Build files for required container images
- Grafana dashboard configuration JSON file

* Gen AI Tutorial: Remove HF secret name

This change removes the Hugging Face secret name used during testing from the provided helm chart values files.

Because only the name of the secret (and not its contents) were present, this is not a data leak.

Additionally, this change make all Hugging Face related variables being w/ "HUGGING_FACE" and not "HF" or another value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants