Skip to content

Commit d459ddd

Browse files
authored
Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide (#95)
* Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide This change adds a guide for deploying autoscaling & load balancing of TensorRT-LLM Gen. AI models. Includes: - Guidance - Helm chart w/ multiple example models value files - YAML files necessary for setting up a Kubernetes cluster - Build files for required container images - Grafana dashboard configuration JSON file * Gen AI Tutorial: Remove HF secret name This change removes the Hugging Face secret name used during testing from the provided helm chart values files. Because only the name of the secret (and not its contents) were present, this is not a data leak. Additionally, this change make all Hugging Face related variables being w/ "HUGGING_FACE" and not "HF" or another value.
1 parent b3759c8 commit d459ddd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4970
-1
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ repos:
6565
- id: check-json
6666
- id: check-toml
6767
- id: check-yaml
68-
exclude: ^deploy(\/[^\/]+)*\/templates\/.*$
68+
exclude: ^Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/chart/templates/.+$
6969
- id: check-shebang-scripts-are-executable
7070
- id: end-of-file-fixer
7171
types_or: [c, c++, cuda, proto, textproto, java, python]

Deployment/Kubernetes/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Kubernetes Deployment of Triton Server Guides
2+
3+
* [TensorRT-LLM Gen. AI Autoscaling & Load Balancing](./TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md)
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.vscode/
2+
**/.vscode/

Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md

Lines changed: 965 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
dev_values.yaml
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v2
16+
appVersion: 0.1.0
17+
description: Triton + TensorRT-LLM autoscaling and load balancing example.
18+
icon: https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/[email protected]
19+
name: triton_trt-llm_aslb-example
20+
version: 0.1.0
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-T4
19+
- Tesla-V100-SXM2-16GB
20+
21+
model:
22+
name: gpt2
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-T4
19+
- Tesla-V100-SXM2-16GB
20+
21+
model:
22+
name: llama-2-7b-chat
23+
tensorrtLlm:
24+
parallelism:
25+
tensor: 2
26+
27+
autoscaling:
28+
metric:
29+
value: 1500m
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-T4
19+
- Tesla-V100-SXM2-16GB
20+
21+
model:
22+
name: llama-2-7b
23+
tensorrtLlm:
24+
parallelism:
25+
tensor: 2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- NVIDIA-A100-SXM4-40GB
19+
20+
model:
21+
name: llama-3-70b-instruct
22+
tensorrtLlm:
23+
parallelism:
24+
tensor: 4
25+
26+
autoscaling:
27+
metric:
28+
value: 3500m
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-T4
19+
- Tesla-V100-SXM2-16GB
20+
21+
model:
22+
name: llama-3-8b-instruct
23+
tensorrtLlm:
24+
parallelism:
25+
tensor: 2
26+
27+
autoscaling:
28+
metric:
29+
value: 1500m
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- NVIDIA-A10G
19+
- NVIDIA-A100-SXM4-40GB
20+
21+
model:
22+
name: llama-3-8b
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-T4
19+
- Tesla-V100-SXM2-16GB
20+
21+
model:
22+
name: opt125m
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{{ $.Chart.Name }} ({{ $.Chart.Version }}) installation complete.
2+
3+
Release Name: {{ $.Release.Name }}
4+
Namespace: {{ $.Release.Namespace }}
5+
Deployment Name: {{ $.Release.Name }}
6+
Service Name: {{ $.Release.Name }}
7+
8+
Helpful commands:
9+
10+
$ helm status --namespace={{ $.Release.Namespace }} {{ $.Release.Name }}
11+
$ helm get --namespace={{ $.Release.Namespace }} all {{ $.Release.Name }}
12+
$ kubectl get --namespace={{ $.Release.Namespace }} --selector='app={{ $.Release.Name }}' deployments,pods,hpa,services,podmonitors

0 commit comments

Comments
 (0)