Skip to content

Commit bf430f8

Browse files
authored
Update README.md 2.45.0 / 24.04 (#7157)
* Update README.md 2.45.0 / 24.04 * Update README.md - remove banner * Fix README.md appearance
1 parent 0845f8e commit bf430f8

File tree

1 file changed

+226
-5
lines changed

1 file changed

+226
-5
lines changed

README.md

+226-5
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,232 @@
2727
-->
2828

2929
# Triton Inference Server
30+
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
3031

31-
---
32+
Triton Inference Server is an open source inference serving software that
33+
streamlines AI inferencing. Triton enables teams to deploy any AI model from
34+
multiple deep learning and machine learning frameworks, including TensorRT,
35+
TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton
36+
Inference Server supports inference across cloud, data center, edge and embedded
37+
devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference
38+
Server delivers optimized performance for many query types, including real time,
39+
batched, ensembles and audio/video streaming. Triton inference Server is part of
40+
[NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/),
41+
a software platform that accelerates the data science pipeline and streamlines
42+
the development and deployment of production AI.
3243

33-
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
44+
Major features include:
45+
46+
- [Supports multiple deep learning
47+
frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton)
48+
- [Supports multiple machine learning
49+
frameworks](https://github.com/triton-inference-server/fil_backend)
50+
- [Concurrent model
51+
execution](docs/user_guide/architecture.md#concurrent-model-execution)
52+
- [Dynamic batching](docs/user_guide/model_configuration.md#dynamic-batcher)
53+
- [Sequence batching](docs/user_guide/model_configuration.md#sequence-batcher) and
54+
[implicit state management](docs/user_guide/architecture.md#implicit-state-management)
55+
for stateful models
56+
- Provides [Backend API](https://github.com/triton-inference-server/backend) that
57+
allows adding custom backends and pre/post processing operations
58+
- Supports writing custom backends in python, a.k.a.
59+
[Python-based backends.](https://github.com/triton-inference-server/backend/blob/r24.04/docs/python_based_backends.md#python-based-backends)
60+
- Model pipelines using
61+
[Ensembling](docs/user_guide/architecture.md#ensemble-models) or [Business
62+
Logic Scripting
63+
(BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
64+
- [HTTP/REST and GRPC inference
65+
protocols](docs/customization_guide/inference_protocols.md) based on the community
66+
developed [KServe
67+
protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
68+
- A [C API](docs/customization_guide/inference_protocols.md#in-process-triton-server-api) and
69+
[Java API](docs/customization_guide/inference_protocols.md#java-bindings-for-in-process-triton-server-api)
70+
allow Triton to link directly into your application for edge and other in-process use cases
71+
- [Metrics](docs/user_guide/metrics.md) indicating GPU utilization, server
72+
throughput, server latency, and more
73+
74+
**New to Triton Inference Server?** Make use of
75+
[these tutorials](https://github.com/triton-inference-server/tutorials)
76+
to begin your Triton journey!
77+
78+
Join the [Triton and TensorRT community](https://www.nvidia.com/en-us/deep-learning-ai/triton-tensorrt-newsletter/) and
79+
stay current on the latest product updates, bug fixes, content, best practices,
80+
and more. Need enterprise support? NVIDIA global support is available for Triton
81+
Inference Server with the
82+
[NVIDIA AI Enterprise software suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
83+
84+
## Serve a Model in 3 Easy Steps
85+
86+
```bash
87+
# Step 1: Create the example model repository
88+
git clone -b r24.04 https://github.com/triton-inference-server/server.git
89+
cd server/docs/examples
90+
./fetch_models.sh
91+
92+
# Step 2: Launch triton from the NGC Triton container
93+
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
94+
95+
# Step 3: Sending an Inference Request
96+
# In a separate console, launch the image_client example from the NGC Triton SDK container
97+
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.04-py3-sdk
98+
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
99+
100+
# Inference should return the following
101+
Image '/workspace/images/mug.jpg':
102+
15.346230 (504) = COFFEE MUG
103+
13.224326 (968) = CUP
104+
10.422965 (505) = COFFEEPOT
105+
```
106+
Please read the [QuickStart](docs/getting_started/quickstart.md) guide for additional information
107+
regarding this example. The quickstart guide also contains an example of how to launch Triton on [CPU-only systems](docs/getting_started/quickstart.md#run-on-cpu-only-system). New to Triton and wondering where to get started? Watch the [Getting Started video](https://youtu.be/NQDtfSi5QF4).
108+
109+
## Examples and Tutorials
110+
111+
Check out [NVIDIA LaunchPad](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/trial/)
112+
for free access to a set of hands-on labs with Triton Inference Server hosted on
113+
NVIDIA infrastructure.
114+
115+
Specific end-to-end examples for popular models, such as ResNet, BERT, and DLRM
116+
are located in the
117+
[NVIDIA Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples)
118+
page on GitHub. The
119+
[NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-triton-inference-server)
120+
contains additional documentation, presentations, and examples.
121+
122+
## Documentation
123+
124+
### Build and Deploy
125+
126+
The recommended way to build and use Triton Inference Server is with Docker
127+
images.
128+
129+
- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-with-docker) (*Recommended*)
130+
- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-without-docker)
131+
- [Build a custom Triton Inference Server Docker container](docs/customization_guide/compose.md)
132+
- [Build Triton Inference Server from source](docs/customization_guide/build.md#building-on-unsupported-platforms)
133+
- [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10)
134+
- Examples for deploying Triton Inference Server with Kubernetes and Helm on [GCP](deploy/gcp/README.md),
135+
[AWS](deploy/aws/README.md), and [NVIDIA FleetCommand](deploy/fleetcommand/README.md)
136+
- [Secure Deployment Considerations](docs/customization_guide/deploy.md)
137+
138+
### Using Triton
139+
140+
#### Preparing Models for Triton Inference Server
141+
142+
The first step in using Triton to serve your models is to place one or
143+
more models into a [model repository](docs/user_guide/model_repository.md). Depending on
144+
the type of the model and on what Triton capabilities you want to enable for
145+
the model, you may need to create a [model
146+
configuration](docs/user_guide/model_configuration.md) for the model.
147+
148+
- [Add custom operations to Triton if needed by your model](docs/user_guide/custom_operations.md)
149+
- Enable model pipelining with [Model Ensemble](docs/user_guide/architecture.md#ensemble-models)
150+
and [Business Logic Scripting (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
151+
- Optimize your models setting [scheduling and batching](docs/user_guide/architecture.md#models-and-schedulers)
152+
parameters and [model instances](docs/user_guide/model_configuration.md#instance-groups).
153+
- Use the [Model Analyzer tool](https://github.com/triton-inference-server/model_analyzer)
154+
to help optimize your model configuration with profiling
155+
- Learn how to [explicitly manage what models are available by loading and
156+
unloading models](docs/user_guide/model_management.md)
157+
158+
#### Configure and Use Triton Inference Server
159+
160+
- Read the [Quick Start Guide](docs/getting_started/quickstart.md) to run Triton Inference
161+
Server on both GPU and CPU
162+
- Triton supports multiple execution engines, called
163+
[backends](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton), including
164+
[TensorRT](https://github.com/triton-inference-server/tensorrt_backend),
165+
[TensorFlow](https://github.com/triton-inference-server/tensorflow_backend),
166+
[PyTorch](https://github.com/triton-inference-server/pytorch_backend),
167+
[ONNX](https://github.com/triton-inference-server/onnxruntime_backend),
168+
[OpenVINO](https://github.com/triton-inference-server/openvino_backend),
169+
[Python](https://github.com/triton-inference-server/python_backend), and more
170+
- Not all the above backends are supported on every platform supported by Triton.
171+
Look at the
172+
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/r24.04/docs/backend_platform_support_matrix.md)
173+
to learn which backends are supported on your target platform.
174+
- Learn how to [optimize performance](docs/user_guide/optimization.md) using the
175+
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/r24.04/src/c++/perf_analyzer/README.md)
176+
and
177+
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
178+
- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in
179+
Triton
180+
- Send requests directly to Triton with the [HTTP/REST JSON-based
181+
or gRPC protocols](docs/customization_guide/inference_protocols.md#httprest-and-grpc-protocols)
182+
183+
#### Client Support and Examples
184+
185+
A Triton *client* application sends inference and other requests to Triton. The
186+
[Python and C++ client libraries](https://github.com/triton-inference-server/client)
187+
provide APIs to simplify this communication.
188+
189+
- Review client examples for [C++](https://github.com/triton-inference-server/client/blob/r24.04/src/c%2B%2B/examples),
190+
[Python](https://github.com/triton-inference-server/client/blob/r24.04/src/python/examples),
191+
and [Java](https://github.com/triton-inference-server/client/blob/r24.04/src/java/src/main/java/triton/client/examples)
192+
- Configure [HTTP](https://github.com/triton-inference-server/client#http-options)
193+
and [gRPC](https://github.com/triton-inference-server/client#grpc-options)
194+
client options
195+
- Send input data (e.g. a jpeg image) directly to Triton in the [body of an HTTP
196+
request without any additional metadata](https://github.com/triton-inference-server/server/blob/r24.04/docs/protocol/extension_binary_data.md#raw-binary-request)
197+
198+
### Extend Triton
199+
200+
[Triton Inference Server's architecture](docs/user_guide/architecture.md) is specifically
201+
designed for modularity and flexibility
202+
203+
- [Customize Triton Inference Server container](docs/customization_guide/compose.md) for your use case
204+
- [Create custom backends](https://github.com/triton-inference-server/backend)
205+
in either [C/C++](https://github.com/triton-inference-server/backend/blob/r24.04/README.md#triton-backend-api)
206+
or [Python](https://github.com/triton-inference-server/python_backend)
207+
- Create [decoupled backends and models](docs/user_guide/decoupled_models.md) that can send
208+
multiple responses for a request or not send any responses for a request
209+
- Use a [Triton repository agent](docs/customization_guide/repository_agents.md) to add functionality
210+
that operates when a model is loaded and unloaded, such as authentication,
211+
decryption, or conversion
212+
- Deploy Triton on [Jetson and JetPack](docs/user_guide/jetson.md)
213+
- [Use Triton on AWS
214+
Inferentia](https://github.com/triton-inference-server/python_backend/tree/main/inferentia)
215+
216+
### Additional Documentation
217+
218+
- [FAQ](docs/user_guide/faq.md)
219+
- [User Guide](docs/README.md#user-guide)
220+
- [Customization Guide](docs/README.md#customization-guide)
221+
- [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html)
222+
- [GPU, Driver, and CUDA Support
223+
Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html)
224+
225+
## Contributing
226+
227+
Contributions to Triton Inference Server are more than welcome. To
228+
contribute please review the [contribution
229+
guidelines](CONTRIBUTING.md). If you have a backend, client,
230+
example or similar contribution that is not modifying the core of
231+
Triton, then you should file a PR in the [contrib
232+
repo](https://github.com/triton-inference-server/contrib).
233+
234+
## Reporting problems, asking questions
235+
236+
We appreciate any feedback, questions or bug reporting regarding this project.
237+
When posting [issues in GitHub](https://github.com/triton-inference-server/server/issues),
238+
follow the process outlined in the [Stack Overflow document](https://stackoverflow.com/help/mcve).
239+
Ensure posted examples are:
240+
- minimal – use as little code as possible that still produces the
241+
same problem
242+
- complete – provide all parts needed to reproduce the problem. Check
243+
if you can strip external dependencies and still show the problem. The
244+
less time we spend on reproducing problems the more time we have to
245+
fix it
246+
- verifiable – test the code you're about to provide to make sure it
247+
reproduces the problem. Remove all other problems that are not
248+
related to your request/question.
249+
250+
For issues, please use the provided bug report and feature request templates.
251+
252+
For questions, we recommend posting in our community
253+
[GitHub Discussions.](https://github.com/triton-inference-server/server/discussions)
254+
255+
## For more information
34256

35-
> [!WARNING]
36-
> ##### LATEST RELEASE
37-
> You are currently on the `24.04` branch which tracks under-development and unreleased features.
257+
Please refer to the [NVIDIA Developer Triton page](https://developer.nvidia.com/nvidia-triton-inference-server)
258+
for more information.

0 commit comments

Comments
 (0)