Skip to content

Commit 8623def

Browse files
committed
Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide
This change adds a guide for deploying autoscaling & load balancing of TensorRT-LLM Gen. AI models. Includes: - Guidance - Helm chart w/ multiple example models value files - YAML files necessary for setting up a Kubernetes cluster - Build files for required container images - Grafana dashboard configuration JSON file
1 parent b3759c8 commit 8623def

39 files changed

+4881
-0
lines changed

Deployment/Kubernetes/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Kubernetes Deployment of Triton Server Guides
2+
3+
* [TensorRT-LLM Gen. AI Autoscaling & Load Balancing](./TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md)

Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md

Lines changed: 904 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v2
16+
appVersion: 0.1.0
17+
description: Triton + TensorRT-LLM autoscaling and load balancing example.
18+
icon: https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/[email protected]
19+
name: triton_trt-llm_aslb-example
20+
version: 0.1.0
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
gpu:
16+
- Tesla-T4
17+
- Tesla-V100-SXM2-16GB
18+
19+
model:
20+
name: gpt2
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-V100-SXM2-16GB
19+
20+
model:
21+
name: llama-2-7b-chat
22+
pullSecret: hf-model-pull
23+
tensorrtLlm:
24+
parallelism:
25+
tensor: 2
26+
27+
autoscaling:
28+
metric:
29+
value: 1500m
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- NVIDIA-A10G
19+
- NVIDIA-A100-SXM4-40GB
20+
21+
model:
22+
name: llama-2-7b
23+
pullSecret: hf-model-pull
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- NVIDIA-A100-SXM4-40GB
19+
20+
model:
21+
name: llama-3-70b-instruct
22+
pullSecret: hf-model-pull
23+
tensorrtLlm:
24+
parallelism:
25+
tensor: 8
26+
27+
autoscaling:
28+
metric:
29+
value: 3500m
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-V100-SXM2-16GB
19+
20+
model:
21+
name: llama-3-8b-instruct
22+
pullSecret: hf-model-pull
23+
tensorrtLlm:
24+
parallelism:
25+
tensor: 2
26+
27+
autoscaling:
28+
metric:
29+
value: 1500m
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- NVIDIA-A10G
19+
- NVIDIA-A100-SXM4-40GB
20+
21+
model:
22+
name: llama-3-8b
23+
pullSecret: hf-model-pull
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# See values.yaml for reference values.
16+
17+
gpu:
18+
- Tesla-V100-SXM2-16GB
19+
- Tesla-T4
20+
21+
model:
22+
name: opt125m
23+
pullSecret: hf-model-pull
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{{ $.Chart.Name }} ({{ $.Chart.Version }}) installation complete.
2+
3+
Release Name: {{ $.Release.Name }}
4+
Namespace: {{ $.Release.Namespace }}
5+
Deployment Name: {{ $.Release.Name }}
6+
Service Name: {{ $.Release.Name }}
7+
8+
Helpful commands:
9+
10+
$ helm status --namespace={{ $.Release.Namespace }} {{ $.Release.Name }}
11+
$ helm get --namespace={{ $.Release.Namespace }} all {{ $.Release.Name }}
12+
$ kubectl get --namespace={{ $.Release.Namespace }} --selector='app={{ $.Release.Name }}' deployments,pods,hpa,services,podmonitors

0 commit comments

Comments
 (0)