Skip to content

Commit be57909

Browse files
committed
Update README for 22.05 release
1 parent a3e6ab5 commit be57909

File tree

3 files changed

+418
-3
lines changed

3 files changed

+418
-3
lines changed

README.md

+201-3
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,207 @@
2626
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2727
-->
2828

29+
# Triton Inference Server
30+
2931
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
3032

31-
# Triton Inference Server
33+
---
34+
35+
Triton Inference Server is an open source inference serving software that
36+
streamlines AI inferencing. Triton enables teams to deploy any AI model from
37+
multiple deep learning and machine learning frameworks, including TensorRT,
38+
TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton
39+
supports inference across cloud, data center,edge and embedded devices on NVIDIA
40+
GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance
41+
for many query types, including real time, batched, ensembles and audio/video
42+
streaming.
43+
44+
Major features include:
45+
46+
- [Supports multiple deep learning
47+
frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton)
48+
- [Supports multiple machine learning
49+
frameworks](https://github.com/triton-inference-server/fil_backend)
50+
- [Concurrent model
51+
execution](docs/architecture.md#concurrent-model-execution)
52+
- [Dynamic batching](docs/model_configuration.md#dynamic-batcher)
53+
- [Sequence batching](docs/model_configuration.md#sequence-batcher) and
54+
[implicit state management](docs/architecture.md#implicit-state-management)
55+
for stateful models
56+
- Provides [Backend API](https://github.com/triton-inference-server/backend) that
57+
allows adding custom backends and pre/post processing operations
58+
- Model pipelines using
59+
[Ensembling](docs/architecture.md#ensemble-models) or [Business
60+
Logic Scripting
61+
(BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
62+
- [HTTP/REST and GRPC inference
63+
protocols](docs/inference_protocols.md) based on the community
64+
developed [KServe
65+
protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
66+
- A [C API](docs/inference_protocols.md#in-process-triton-server-api) and
67+
[Java API](docs/inference_protocols.md#java-bindings-for-in-process-triton-server-api)
68+
allow Triton to link directly into your application for edge and other in-process use cases
69+
- [Metrics](docs/metrics.md) indicating GPU utilization, server
70+
throughput, server latency, and more
71+
## Serve a Model in 3 Easy Steps
72+
73+
```bash
74+
# Step 1: Create the example model repository
75+
git clone -b r22.05 https://github.com/triton-inference-server/server.git
76+
77+
cd server/docs/examples
78+
79+
./fetch_models.sh
80+
81+
# Step 2: Launch triton from the NGC Triton container
82+
docker run --gpus=1 --rm --net=host -v /full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.05-py3 tritonserver --model-repository=/models
83+
84+
# Step 3: In a separate console, launch the image_client example from the NGC Triton SDK container
85+
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:22.05-py3-sdk
86+
87+
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
88+
89+
# Inference should return the following
90+
Image '/workspace/images/mug.jpg':
91+
15.346230 (504) = COFFEE MUG
92+
13.224326 (968) = CUP
93+
10.422965 (505) = COFFEEPOT
94+
```
95+
Please read the [QuickStart](docs/quickstart.md) guide for additional information
96+
regarding this example. The quickstart guide also contains an example of how to launch Triton on [CPU-only systems](docs/quickstart.md#run-on-cpu-only-system).
97+
98+
## Examples and Tutorials
99+
100+
Specific end-to-end examples for popular models, such as ResNet, BERT, and DLRM
101+
are located in the
102+
[NVIDIA Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples)
103+
page on GitHub. The
104+
[NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-triton-inference-server)
105+
contains additional documentation, presentations, and examples.
106+
107+
## Documentation
108+
109+
### Build and Deploy
110+
111+
The recommended way to build and use Triton Inference Server is with Docker
112+
images.
113+
114+
- [Install Triton Inference Server with Docker containers](docs/build.md#building-triton-with-docker) (*Recommended*)
115+
- [Install Triton Inference Server without Docker containers](docs/build.md#building-triton-without-docker)
116+
- [Build a custom Triton Inference Server Docker container](docs/compose.md)
117+
- [Build Triton Inference Server from source](docs/build.md#building-on-unsupported-platforms)
118+
- [Build Triton Inference Server for Windows 10](docs/build.md#building-for-windows-10)
119+
- Examples for deploying Triton Inference Server with Kubernetes and Helm on [GCP](deploy/gcp/README.md),
120+
[AWS](deploy/aws/README.md), and [NVIDIA FleetCommand](deploy/fleetcommand/README.md)
121+
122+
### Using Triton
123+
124+
#### Preparing Models for Triton Inference Server
125+
126+
The first step in using Triton to serve your models is to place one or
127+
more models into a [model repository](docs/model_repository.md). Depending on
128+
the type of the model and on what Triton capabilities you want to enable for
129+
the model, you may need to create a [model
130+
configuration](docs/model_configuration.md) for the model.
131+
132+
- [Add custom operations to Triton if needed by your model](docs/custom_operations.md)
133+
- Enable model pipelining with [Model Ensemble](docs/architecture.md#ensemble-models)
134+
and [Business Logic Scripting (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
135+
- Optimize your models setting [scheduling and batching](docs/architecture.md#models-and-schedulers)
136+
parameters and [model instances](docs/model_configuration.md#instance-groups).
137+
- Use the [Model Analyzer tool](https://github.com/triton-inference-server/model_analyzer)
138+
to help optimize your model configuration with profiling
139+
- Learn how to [explicitly manage what models are available by loading and
140+
unloading models](docs/model_management.md)
141+
142+
#### Configure and Use Triton Inference Server
143+
144+
- Read the [Quick Start Guide](docs/quickstart.md) to run Triton Inference
145+
Server on both GPU and CPU
146+
- Triton supports multiple execution engines, called
147+
[backends](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton), including
148+
[TensorRT](https://github.com/triton-inference-server/tensorrt_backend),
149+
[TensorFlow](https://github.com/triton-inference-server/tensorflow_backend),
150+
[PyTorch](https://github.com/triton-inference-server/pytorch_backend),
151+
[ONNX](https://github.com/triton-inference-server/onnxruntime_backend),
152+
[OpenVINO](https://github.com/triton-inference-server/openvino_backend),
153+
[Python](https://github.com/triton-inference-server/python_backend), and more
154+
- Learn how to [optimize performance](docs/optimization.md) using the
155+
[Performance Analyzer](docs/perf_analyzer.md) and
156+
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
157+
- Learn how to [manage loading and unloading models](docs/model_management.md) in
158+
Triton
159+
- Send requests directly to Triton with the [HTTP/REST JSON-based
160+
or gRPC protocols](docs/inference_protocols.md#httprest-and-grpc-protocols)
161+
162+
#### Client Support and Examples
163+
164+
A Triton *client* application sends inference and other requests to Triton. The
165+
[Python and C++ client libraries](https://github.com/triton-inference-server/client)
166+
provide APIs to simplify this communication.
167+
168+
- Review client examples for [C++](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/examples),
169+
[Python](https://github.com/triton-inference-server/client/blob/main/src/python/examples),
170+
and [Java](https://github.com/triton-inference-server/client/blob/main/src/java/src/main/java/triton/client/examples)
171+
- Configure [HTTP](https://github.com/triton-inference-server/client#http-options)
172+
and [gRPC](https://github.com/triton-inference-server/client#grpc-options)
173+
client options
174+
- Send input data (e.g. a jpeg image) directly to Triton in the [body of an HTTP
175+
request without any additional metadata](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md#raw-binary-request)
176+
177+
### Extend Triton
178+
179+
[Triton Inference Server's archicture](docs/architecture.md) is specifically
180+
designed for modularity and flexibility
181+
182+
- [Customize Triton Inference Server container](docs/compose.md) for your use case
183+
- [Create custom backends](https://github.com/triton-inference-server/backend)
184+
in either [C/C++](https://github.com/triton-inference-server/backend/blob/main/README.md#triton-backend-api)
185+
or [Python](https://github.com/triton-inference-server/python_backend)
186+
- Create [decouple backends and models](docs/decoupled_models.md) that can send
187+
multiple responses for a request or not send any responses for a request
188+
- Use a [Triton repository agent](docs/repository_agents.md) to add functionality
189+
that operates when a model is loaded and unloaded, such as authentication,
190+
decryption, or conversion
191+
- Deploy Triton on [Jetson and JetPack](docs/jetson.md)
192+
- [Use Triton on AWS
193+
Inferentia](https://github.com/triton-inference-server/python_backend/tree/main/inferentia)
194+
195+
### Additional Documentation
196+
197+
- [FAQ](docs/faq.md)
198+
- [User Guide](docs#user-guide)
199+
- [Developer Guide](docs#developer-guide)
200+
- [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html)
201+
- [GPU, Driver, and CUDA Support
202+
Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html)
203+
204+
## Contributing
205+
206+
Contributions to Triton Inference Server are more than welcome. To
207+
contribute please review the [contribution
208+
guidelines](CONTRIBUTING.md). If you have a backend, client,
209+
example or similar contribution that is not modifying the core of
210+
Triton, then you should file a PR in the [contrib
211+
repo](https://github.com/triton-inference-server/contrib).
212+
213+
## Reporting problems, asking questions
214+
215+
We appreciate any feedback, questions or bug reporting regarding this project.
216+
When posting [issues in GitHub](https://github.com/triton-inference-server/server/issues),
217+
follow the process outlined in the [Stack Overflow document](https://stackoverflow.com/help/mcve).
218+
Ensure posted examples are:
219+
- minimal – use as little code as possible that still produces the
220+
same problem
221+
- complete – provide all parts needed to reproduce the problem. Check
222+
if you can strip external dependencies and still show the problem. The
223+
less time we spend on reproducing problems the more time we have to
224+
fix it
225+
- verifiable – test the code you're about to provide to make sure it
226+
reproduces the problem. Remove all other problems that are not
227+
related to your request/question.
228+
229+
## For more information
32230

33-
**NOTE: You are currently on the r22.05 branch which tracks stabilization
34-
towards the next release. This branch is not usable during stabilization.**
231+
Please refer to the [NVIDIA Developer Triton page](https://developer.nvidia.com/nvidia-triton-inference-server)
232+
for more information.

RELEASE.md

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
<!--
2+
# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Release Notes for 2.22.0
30+
31+
## New Freatures and Improvements
32+
33+
* Triton In-Process API is available in
34+
[Java](https://github.com/triton-inference-server/server/blob/r22.05/docs/inference_protocols.md#java-bindings-for-in-process-triton-server-api).
35+
36+
* Python backend supports the
37+
[decoupled API](https://github.com/triton-inference-server/python_backend/tree/r22.05#decoupled-mode-beta)
38+
as BETA release.
39+
40+
* You may load models with
41+
[file content](https://github.com/triton-inference-server/server/blob/r22.05/docs/protocol/extension_model_repository.md#load)
42+
provided during the Triton Server API invocation.
43+
44+
* Triton supports
45+
[BF16 data type](https://github.com/triton-inference-server/server/blob/r22.05/docs/model_configuration.md#datatypes).
46+
47+
* PyTorch backend supports
48+
[1-dimensional String I/O](https://github.com/triton-inference-server/pytorch_backend/tree/r22.05#important-note).
49+
50+
* Explicit model control mode supports
51+
[loading all models at startup](https://github.com/triton-inference-server/server/blob/r22.05/docs/model_management.md#model-control-mode-explicit).
52+
53+
* You may specify
54+
[customized GRPC channel settings](https://github.com/triton-inference-server/client/blob/r22.05/src/python/library/tritonclient/grpc/__init__.py#L193-L200)
55+
in the GRPC client library.
56+
57+
* Triton In-Process API supports
58+
[dynamic model repository registration](https://github.com/triton-inference-server/core/blob/r22.05/include/triton/core/tritonserver.h#L1903-L1923).
59+
60+
* [Improve build pipeline](https://github.com/triton-inference-server/server/blob/r22.05/docs/build.md)
61+
in `build.py` and generate build scripts used for pipeline examination.
62+
63+
* ONNX Runtime backend updated to ONNX Runtime version 1.11.1 in both Ubuntu and
64+
Windows versions of Triton.
65+
66+
* Refer to the 22.05 column of the
67+
[Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
68+
for container image versions on which the 22.05 inference server container is
69+
based.
70+
71+
## Known Issues
72+
73+
* Triton PIP wheels for ARM SBSA are not available from PyPI and pip will
74+
install an incorrect Jetson version of Triton for Arm SBSA.
75+
76+
The correct wheel file can be pulled directly from the Arm SBSA SDK image and
77+
manually installed.
78+
79+
* Traced models in PyTorch seem to create overflows when int8 tensor values are
80+
transformed to int32 on the GPU.
81+
82+
Refer to [pytorch/pytorch#66930](http://pytorch/pytorch#66930) for more
83+
information.
84+
85+
* Triton cannot retrieve GPU metrics with MIG-enabled GPU devices (A100 and A30).
86+
87+
* Triton metrics might not work if the host machine is running a separate DCGM
88+
agent on bare-metal or in a container.
89+
90+
* Running a PyTorch TorchScript model using the PyTorch backend, where multiple
91+
instances of a model are configured can lead to a slowdown in model execution
92+
due to the following PyTorch issue:
93+
[pytorch/pytorch#27902](http://pytorch/pytorch#27902).
94+
95+
* Starting in 22.02, the Triton container, which uses the 22.05 PyTorch
96+
container, will report an error during model loading in the PyTorch backend
97+
when using scripted models that were exported in the legacy format (using our
98+
19.09 or previous PyTorch NGC containers corresponding to PyTorch 1.2.0 or
99+
previous releases).
100+
101+
To load the model successfully in Triton, you need to export the model again
102+
by using a recent version of PyTorch.

0 commit comments

Comments
 (0)