|
30 | 30 |
|
31 | 31 | [](https://opensource.org/licenses/BSD-3-Clause)
|
32 | 32 |
|
33 |
| -**Note** <br> |
34 |
| -You are currently on the r23.06 branch which tracks stabilization towards the next release.<br> |
35 |
| -This branch is not usable during stabilization. |
| 33 | +Triton Inference Server is an open source inference serving software that |
| 34 | +streamlines AI inferencing. Triton enables teams to deploy any AI model from |
| 35 | +multiple deep learning and machine learning frameworks, including TensorRT, |
| 36 | +TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton |
| 37 | +supports inference across cloud, data center,edge and embedded devices on NVIDIA |
| 38 | +GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance |
| 39 | +for many query types, including real time, batched, ensembles and audio/video |
| 40 | +streaming. |
| 41 | + |
| 42 | +Major features include: |
| 43 | + |
| 44 | +- [Supports multiple deep learning |
| 45 | + frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton) |
| 46 | +- [Supports multiple machine learning |
| 47 | + frameworks](https://github.com/triton-inference-server/fil_backend) |
| 48 | +- [Concurrent model |
| 49 | + execution](docs/user_guide/architecture.md#concurrent-model-execution) |
| 50 | +- [Dynamic batching](docs/user_guide/model_configuration.md#dynamic-batcher) |
| 51 | +- [Sequence batching](docs/user_guide/model_configuration.md#sequence-batcher) |
| 52 | + and |
| 53 | + [implicit state |
| 54 | +management](docs/user_guide/architecture.md#implicit-state-management) |
| 55 | + for stateful models |
| 56 | +- Provides [Backend API](https://github.com/triton-inference-server/backend) |
| 57 | + that |
| 58 | + allows adding custom backends and pre/post processing operations |
| 59 | +- Model pipelines using |
| 60 | + [Ensembling](docs/user_guide/architecture.md#ensemble-models) or [Business |
| 61 | + Logic Scripting |
| 62 | + (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting) |
| 63 | +- [HTTP/REST and GRPC inference |
| 64 | + protocols](docs/customization_guide/inference_protocols.md) based on the |
| 65 | +community |
| 66 | + developed [KServe |
| 67 | + protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) |
| 68 | +- A [C |
| 69 | + API](docs/customization_guide/inference_protocols.md#in-process-triton-server-api) |
| 70 | +and |
| 71 | + [Java |
| 72 | +API](docs/customization_guide/inference_protocols.md#java-bindings-for-in-process-triton-server-api) |
| 73 | + allow Triton to link directly into your application for edge and other |
| 74 | +in-process use cases |
| 75 | +- [Metrics](docs/user_guide/metrics.md) indicating GPU utilization, server |
| 76 | + throughput, server latency, and more |
| 77 | + |
| 78 | +**New to Triton Inference Server?** Make use of |
| 79 | +[these tutorials](https://github.com/triton-inference-server/tutorials) |
| 80 | +to begin your Triton journey! |
| 81 | + |
| 82 | +Join the [Triton and TensorRT |
| 83 | +community](https://www.nvidia.com/en-us/deep-learning-ai/triton-tensorrt-newsletter/) |
| 84 | +and |
| 85 | +stay current on the latest product updates, bug fixes, content, best practices, |
| 86 | +and more. Need enterprise support? NVIDIA global support is available for |
| 87 | +Triton |
| 88 | +Inference Server with the |
| 89 | +[NVIDIA AI Enterprise software |
| 90 | +suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/). |
| 91 | + |
| 92 | +## Serve a Model in 3 Easy Steps |
| 93 | + |
| 94 | +```bash |
| 95 | +# Step 1: Create the example model repository |
| 96 | +git clone -b r23.06 https://github.com/triton-inference-server/server.git |
| 97 | +cd server/docs/examples |
| 98 | +./fetch_models.sh |
| 99 | + |
| 100 | +# Step 2: Launch triton from the NGC Triton container |
| 101 | +docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models |
| 102 | +nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models |
| 103 | + |
| 104 | +# Step 3: Sending an Inference Request |
| 105 | +# In a separate console, launch the image_client example from the NGC Triton SDK |
| 106 | +container |
| 107 | +docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:23.06-py3-sdk |
| 108 | +/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION |
| 109 | +/workspace/images/mug.jpg |
| 110 | + |
| 111 | +# Inference should return the following |
| 112 | +Image '/workspace/images/mug.jpg': |
| 113 | + 15.346230 (504) = COFFEE MUG |
| 114 | + 13.224326 (968) = CUP |
| 115 | + 10.422965 (505) = COFFEEPOT |
| 116 | +``` |
| 117 | +Please read the [QuickStart](docs/getting_started/quickstart.md) guide for |
| 118 | +additional information |
| 119 | +regarding this example. The quickstart guide also contains an example of how to |
| 120 | +launch Triton on [CPU-only |
| 121 | +systems](docs/getting_started/quickstart.md#run-on-cpu-only-system). New to |
| 122 | +Triton and wondering where to get started? Watch the [Getting Started |
| 123 | +video](https://youtu.be/NQDtfSi5QF4). |
| 124 | + |
| 125 | +## Examples and Tutorials |
| 126 | + |
| 127 | +Check out [NVIDIA |
| 128 | +LaunchPad](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/trial/) |
| 129 | +for free access to a set of hands-on labs with Triton Inference Server hosted on |
| 130 | +NVIDIA infrastructure. |
| 131 | + |
| 132 | +Specific end-to-end examples for popular models, such as ResNet, BERT, and DLRM |
| 133 | +are located in the |
| 134 | +[NVIDIA Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples) |
| 135 | +page on GitHub. The |
| 136 | +[NVIDIA Developer |
| 137 | +Zone](https://developer.nvidia.com/nvidia-triton-inference-server) |
| 138 | +contains additional documentation, presentations, and examples. |
| 139 | + |
| 140 | +## Documentation |
| 141 | + |
| 142 | +### Build and Deploy |
| 143 | + |
| 144 | +The recommended way to build and use Triton Inference Server is with Docker |
| 145 | +images. |
| 146 | + |
| 147 | +- [Install Triton Inference Server with Docker |
| 148 | + containers](docs/customization_guide/build.md#building-with-docker) |
| 149 | +(*Recommended*) |
| 150 | +- [Install Triton Inference Server without Docker |
| 151 | + containers](docs/customization_guide/build.md#building-without-docker) |
| 152 | +- [Build a custom Triton Inference Server Docker |
| 153 | + container](docs/customization_guide/compose.md) |
| 154 | +- [Build Triton Inference Server from |
| 155 | + source](docs/customization_guide/build.md#building-on-unsupported-platforms) |
| 156 | +- [Build Triton Inference Server for Windows |
| 157 | + 10](docs/customization_guide/build.md#building-for-windows-10) |
| 158 | +- Examples for deploying Triton Inference Server with Kubernetes and Helm on |
| 159 | + [GCP](deploy/gcp/README.md), |
| 160 | + [AWS](deploy/aws/README.md), and [NVIDIA |
| 161 | +FleetCommand](deploy/fleetcommand/README.md) |
| 162 | + |
| 163 | +### Using Triton |
| 164 | + |
| 165 | +#### Preparing Models for Triton Inference Server |
| 166 | + |
| 167 | +The first step in using Triton to serve your models is to place one or |
| 168 | +more models into a [model repository](docs/user_guide/model_repository.md). |
| 169 | +Depending on |
| 170 | +the type of the model and on what Triton capabilities you want to enable for |
| 171 | +the model, you may need to create a [model |
| 172 | +configuration](docs/user_guide/model_configuration.md) for the model. |
| 173 | + |
| 174 | +- [Add custom operations to Triton if needed by your |
| 175 | + model](docs/user_guide/custom_operations.md) |
| 176 | +- Enable model pipelining with [Model |
| 177 | + Ensemble](docs/user_guide/architecture.md#ensemble-models) |
| 178 | + and [Business Logic Scripting |
| 179 | +(BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting) |
| 180 | +- Optimize your models setting [scheduling and |
| 181 | + batching](docs/user_guide/architecture.md#models-and-schedulers) |
| 182 | + parameters and [model |
| 183 | +instances](docs/user_guide/model_configuration.md#instance-groups). |
| 184 | +- Use the [Model Analyzer |
| 185 | + tool](https://github.com/triton-inference-server/model_analyzer) |
| 186 | + to help optimize your model configuration with profiling |
| 187 | +- Learn how to [explicitly manage what models are available by loading and |
| 188 | + unloading models](docs/user_guide/model_management.md) |
| 189 | + |
| 190 | +#### Configure and Use Triton Inference Server |
| 191 | + |
| 192 | +- Read the [Quick Start Guide](docs/getting_started/quickstart.md) to run Triton |
| 193 | + Inference |
| 194 | + Server on both GPU and CPU |
| 195 | +- Triton supports multiple execution engines, called |
| 196 | + [backends](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton), |
| 197 | +including |
| 198 | + [TensorRT](https://github.com/triton-inference-server/tensorrt_backend), |
| 199 | + [TensorFlow](https://github.com/triton-inference-server/tensorflow_backend), |
| 200 | + [PyTorch](https://github.com/triton-inference-server/pytorch_backend), |
| 201 | + [ONNX](https://github.com/triton-inference-server/onnxruntime_backend), |
| 202 | + [OpenVINO](https://github.com/triton-inference-server/openvino_backend), |
| 203 | + [Python](https://github.com/triton-inference-server/python_backend), and more |
| 204 | +- Not all the above backends are supported on every platform supported by |
| 205 | + Triton. |
| 206 | + Look at the |
| 207 | + [Backend-Platform Support |
| 208 | +Matrix](https://github.com/triton-inference-server/backend/blob/r23.06/docs/backend_platform_support_matrix.md) |
| 209 | + to learn which backends are supported on your target platform. |
| 210 | +- Learn how to [optimize performance](docs/user_guide/optimization.md) using the |
| 211 | + [Performance |
| 212 | +Analyzer](https://github.com/triton-inference-server/client/blob/r23.06/src/c++/perf_analyzer/README.md) |
| 213 | + and |
| 214 | + [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) |
| 215 | +- Learn how to [manage loading and unloading |
| 216 | + models](docs/user_guide/model_management.md) in |
| 217 | + Triton |
| 218 | +- Send requests directly to Triton with the [HTTP/REST JSON-based |
| 219 | + or gRPC |
| 220 | +protocols](docs/customization_guide/inference_protocols.md#httprest-and-grpc-protocols) |
| 221 | + |
| 222 | +#### Client Support and Examples |
| 223 | + |
| 224 | +A Triton *client* application sends inference and other requests to Triton. The |
| 225 | +[Python and C++ client |
| 226 | +libraries](https://github.com/triton-inference-server/client) |
| 227 | +provide APIs to simplify this communication. |
| 228 | + |
| 229 | +- Review client examples for |
| 230 | + [C++](https://github.com/triton-inference-server/client/blob/r23.06/src/c%2B%2B/examples), |
| 231 | + [Python](https://github.com/triton-inference-server/client/blob/r23.06/src/python/examples), |
| 232 | + and |
| 233 | +[Java](https://github.com/triton-inference-server/client/blob/r23.06/src/java/src/r23.06/java/triton/client/examples) |
| 234 | +- Configure |
| 235 | + [HTTP](https://github.com/triton-inference-server/client#http-options) |
| 236 | + and [gRPC](https://github.com/triton-inference-server/client#grpc-options) |
| 237 | + client options |
| 238 | +- Send input data (e.g. a jpeg image) directly to Triton in the [body of an HTTP |
| 239 | + request without any additional |
| 240 | +metadata](https://github.com/triton-inference-server/server/blob/r23.06/docs/protocol/extension_binary_data.md#raw-binary-request) |
| 241 | + |
| 242 | +### Extend Triton |
| 243 | + |
| 244 | +[Triton Inference Server's architecture](docs/user_guide/architecture.md) is |
| 245 | +specifically |
| 246 | +designed for modularity and flexibility |
| 247 | + |
| 248 | +- [Customize Triton Inference Server |
| 249 | + container](docs/customization_guide/compose.md) for your use case |
| 250 | +- [Create custom backends](https://github.com/triton-inference-server/backend) |
| 251 | + in either |
| 252 | +[C/C++](https://github.com/triton-inference-server/backend/blob/r23.06/README.md#triton-backend-api) |
| 253 | + or [Python](https://github.com/triton-inference-server/python_backend) |
| 254 | +- Create [decouple backends and models](docs/user_guide/decoupled_models.md) |
| 255 | + that can send |
| 256 | + multiple responses for a request or not send any responses for a request |
| 257 | +- Use a [Triton repository agent](docs/customization_guide/repository_agents.md) |
| 258 | + to add functionality |
| 259 | + that operates when a model is loaded and unloaded, such as authentication, |
| 260 | + decryption, or conversion |
| 261 | +- Deploy Triton on [Jetson and JetPack](docs/user_guide/jetson.md) |
| 262 | +- [Use Triton on AWS |
| 263 | + Inferentia](https://github.com/triton-inference-server/python_backend/tree/r23.06/inferentia) |
| 264 | + |
| 265 | +### Additional Documentation |
| 266 | + |
| 267 | +- [FAQ](docs/user_guide/faq.md) |
| 268 | +- [User Guide](docs/README.md#user-guide) |
| 269 | +- [Customization Guide](docs/README.md#customization-guide) |
| 270 | +- [Release |
| 271 | + Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html) |
| 272 | +- [GPU, Driver, and CUDA Support |
| 273 | +Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html) |
| 274 | + |
| 275 | +## Contributing |
| 276 | + |
| 277 | +Contributions to Triton Inference Server are more than welcome. To |
| 278 | +contribute please review the [contribution |
| 279 | +guidelines](CONTRIBUTING.md). If you have a backend, client, |
| 280 | +example or similar contribution that is not modifying the core of |
| 281 | +Triton, then you should file a PR in the [contrib |
| 282 | +repo](https://github.com/triton-inference-server/contrib). |
| 283 | + |
| 284 | +## Reporting problems, asking questions |
| 285 | + |
| 286 | +We appreciate any feedback, questions or bug reporting regarding this project. |
| 287 | +When posting [issues in |
| 288 | +GitHub](https://github.com/triton-inference-server/server/issues), |
| 289 | +follow the process outlined in the [Stack Overflow |
| 290 | +document](https://stackoverflow.com/help/mcve). |
| 291 | +Ensure posted examples are: |
| 292 | +- minimal – use as little code as possible that still produces the |
| 293 | + same problem |
| 294 | +- complete – provide all parts needed to reproduce the problem. Check |
| 295 | + if you can strip external dependencies and still show the problem. The |
| 296 | + less time we spend on reproducing problems the more time we have to |
| 297 | + fix it |
| 298 | +- verifiable – test the code you're about to provide to make sure it |
| 299 | + reproduces the problem. Remove all other problems that are not |
| 300 | + related to your request/question. |
| 301 | + |
| 302 | +For issues, please use the provided bug report and feature request templates. |
| 303 | + |
| 304 | +For questions, we recommend posting in our community |
| 305 | +[GitHub |
| 306 | +Discussions.](https://github.com/triton-inference-server/server/discussions) |
| 307 | + |
| 308 | +## For more information |
| 309 | + |
| 310 | +Please refer to the [NVIDIA Developer Triton |
| 311 | +page](https://developer.nvidia.com/nvidia-triton-inference-server) |
| 312 | +for more information. |
| 313 | + |
0 commit comments