feat: Add blueprint TensorRT-LLM + Triton #65

aasthavar · 2025-04-30T12:37:14Z

What does this PR do?

Fixes #53!

Adds a new blueprint at blueprints/inference/trtllm-nvidia-triton-server-gpu for deploying and serving LLMs using TensorRT-LLM with NVIDIA Triton Inference Server, demonstrated with a 1B parameter LLaMA model for ultra-low latency.

Includes:

Scripts for building and pushing Triton + TensorRT-LLM images
GPU inference profiling
Troubleshooting and system checks
Autoscaling validation
Observability via Prometheus and Grafana

Motivation

No existing blueprint demonstrates optimized LLM inference with TensorRT-LLM + Triton. This fills that gap with a performant, scalable GPU-based solution.

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

Repo maintainer requested a re-submission due to upstream changes requiring recent PR contributors to re-fork and re-apply their changes.
pre-commit's results:
deployment status:

omrishiv

This is incredibly thorough, thank you so much for the contribution! I have a few small requests/questions. @vara-bonthu would you also mind taking a look as this infrastructure is the only one I haven't consolidated and you may be a little more familiar with it.

omrishiv · 2025-05-13T17:33:07Z

.gitignore

+blueprints/inference/trtllm-nvidia-triton-server-gpu/triton_model_files/*
+blueprints/inference/trtllm-nvidia-triton-server-gpu/benchmark-grpc/results.txt
+blueprints/inference/trtllm-nvidia-triton-server-gpu/benchmark-http/results/*
+blueprints/inference/trtllm-nvidia-triton-server-gpu/.ecr_repo_uri


As .ecr_repo_uri and .eks_region seem to be a common pattern, can we gitignore them with

**/.ecr_repo_uri **/.eks_region

omrishiv · 2025-05-13T17:33:20Z

blueprints/inference/trtllm-nvidia-triton-server-gpu/Dockerfile

+COPY start.sh /start.sh
+RUN chmod +x /start.sh
+
+ENTRYPOINT ["/bin/bash", "/start.sh"]


omrishiv · 2025-05-13T17:36:38Z

blueprints/inference/trtllm-nvidia-triton-server-gpu/build-and-push-image.sh

+echo -e "\nBuilding llama finetuning trn1 docker image" \
+  && docker build . --no-cache -t $ECR_REPO_URI:latest \
+  && docker push $ECR_REPO_URI:latest \
+  && echo -e "\nImage successfully pushed to ECR"


omrishiv · 2025-05-13T17:36:56Z

blueprints/inference/trtllm-nvidia-triton-server-gpu/check-ec2-availability.py

+    print(f"* {region}: {region_long_name}")
+    for instance_type in instance_types:
+      print(f"  - {instance_type}")
+    print("\n")


omrishiv · 2025-05-13T17:38:18Z

blueprints/inference/trtllm-nvidia-triton-server-gpu/start.sh

@@ -0,0 +1,3 @@
+#!/bin/bash
+
+python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=/triton_model_files


omrishiv · 2025-05-13T17:39:50Z

infra/nvidia-triton-server/addons.tf

+        instanceStorePolicy: RAID0
+      nodePool:
+        labels:
+          - type: karpenter


can we align these to the current labels:

- instanceType: g6e-gpu-karpenter - type: karpenter - accelerator: nvidia - gpuType: l40s

omrishiv · 2025-05-13T17:40:22Z

infra/nvidia-triton-server/addons.tf

+            values: ["amd64"]
+          - key: "karpenter.sh/capacity-type"
+            operator: In
+            values: ["on-demand"]


please add spot

omrishiv · 2025-05-13T17:40:51Z

infra/nvidia-triton-server/nvidia-triton-server.tf

+# --------------------------------------------------------------------------------------------------------- #
+# NOTE: this is a reminder to modify "aws s3 sync command" within provisioner "local-exec", before deploying
+# --------------------------------------------------------------------------------------------------------- #
+# module "triton_server_trtllm" {


why is all of this commented out?

omrishiv · 2025-05-13T17:41:25Z

infra/nvidia-triton-server/nvidia-triton-server.tf

    bucket_name = module.s3_bucket[count.index].s3_bucket_id
  }

  provisioner "local-exec" {


is this possible to toggle base on which one you're using?

omrishiv · 2025-05-13T17:42:07Z

website/docs/blueprints/inference/GPUs/trtllm-nvidia-triton-server-gpu.md

@@ -0,0 +1,658 @@
+---
+title: NVIDIA Triton Server with TensorRT LLM
+sidebar_position: 2


please remove the fixed positioning

aasthavar added 2 commits April 30, 2025 12:31

feat: add files for triton+trtllm pattern

c2c79a6

docs: add documentation for triton+trtllm pattern

cd2c9a2

omrishiv requested changes May 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add blueprint TensorRT-LLM + Triton #65

feat: Add blueprint TensorRT-LLM + Triton #65

Uh oh!

aasthavar commented Apr 30, 2025 •

edited

Loading

Uh oh!

omrishiv left a comment

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

omrishiv May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,3 @@
		#!/bin/bash

		python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=/triton_model_files No newline at end of file

feat: Add blueprint TensorRT-LLM + Triton #65

Are you sure you want to change the base?

feat: Add blueprint TensorRT-LLM + Triton #65

Uh oh!

Conversation

aasthavar commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

More

For Moderators

Additional Notes

Uh oh!

omrishiv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aasthavar commented Apr 30, 2025 •

edited

Loading