Examples

We provide simple examples on how to integrate PyTorch, TensorFlow2, JAX, and simple Python models with the Triton Inference Server using PyTriton. The examples are available in the GitHub repository.

Samples Models Deployment

The list of example models deployments:

Add-Sub Python model
Add-Sub Python model Jupyter Notebook
BART PyTorch from HuggingFace
BERT JAX from HuggingFace
Identity Python model
Linear RAPIDS/CuPy model
Linear RAPIDS/CuPy model Jupyter Notebook
Linear PyTorch model
Multi-Layer TensorFlow2
Multi Instance deployment for ResNet50 PyTorch model
Multi Model deployment for Python models
NeMo Megatron GPT model with multi-node support
OPT JAX from HuggingFace with multi-node support
ResNet50 PyTorch from HuggingFace
Stable Diffusion 1.5 from HuggingFace
Using custom HTTP/gRPC headers and parameters

Profiling models

The Perf Analyzer can be used to profile the models served through PyTriton. We have prepared an example of using Perf Analyzer to profile BART PyTorch. See the example code in the GitHub repository.

Kubernetes Deployment

The following examples contain a guide on how to deploy them on a Kubernetes cluster:

BART PyTorch from HuggingFace
OPT JAX from HuggingFace with multi-node support
NeMo Megatron GPT model with multi-node support
ResNet50 PyTorch from HuggingFace
Stable Diffusion 1.5 from HuggingFace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Examples

Samples Models Deployment

Profiling models

Kubernetes Deployment

Files

README.md

Latest commit

History

README.md

File metadata and controls

Examples

Samples Models Deployment

Profiling models

Kubernetes Deployment