|
30 | 30 | NVIDIA TensorRT Inference Server
|
31 | 31 | ================================
|
32 | 32 |
|
| 33 | +.. overview-begin-marker-do-not-remove |
33 | 34 |
|
34 |
| - **NOTE: You are currently on the r18.12 branch which tracks |
35 |
| - stabilization towards the next release. This branch is not usable |
36 |
| - during stabilization.** |
| 35 | +The NVIDIA TensorRT Inference Server (TRTIS) provides a cloud |
| 36 | +inferencing solution optimized for NVIDIA GPUs. The server provides an |
| 37 | +inference service via an HTTP or GRPC endpoint, allowing remote |
| 38 | +clients to request inferencing for any model being managed by the |
| 39 | +server. |
37 | 40 |
|
38 |
| -.. overview-begin-marker-do-not-remove |
| 41 | +What's New in 0.9.0 Beta |
| 42 | +------------------------ |
| 43 | + |
| 44 | +* TRTIS now `monitors the model repository |
| 45 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_ |
| 46 | + for any change and dynamically reloads the model when necessary, |
| 47 | + without requiring a server restart. It is now possible to add and |
| 48 | + remove model versions, add/remove entire models, modify the model |
| 49 | + configuration, and modify the model labels while the server is |
| 50 | + running. |
| 51 | +* Added a `model priority |
| 52 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#optimization-policy>`_ |
| 53 | + parameter to the model configuration. Currently the model priority |
| 54 | + controls the CPU thread priority when executing the model and for |
| 55 | + TensorRT models also controls the CUDA stream priority. |
| 56 | +* Fixed a bug in GRPC API: changed the model version parameter from |
| 57 | + string to int. This is a non-backwards compatible change. |
| 58 | +* Added --strict-model-config=false option to allow some `model |
| 59 | + configuration properties to be derived automatically |
| 60 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#generated-model-configuration>`_. For |
| 61 | + some model types, this removes the need to specify the config.pbtxt |
| 62 | + file. |
| 63 | +* Improved performance from an asynchronous GRPC frontend. |
| 64 | + |
| 65 | +Features |
| 66 | +-------- |
| 67 | + |
| 68 | +TRTIS provides the following features: |
39 | 69 |
|
| 70 | +* `Multiple framework support |
| 71 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The |
| 72 | + server can manage any number and mix of models (limited by system |
| 73 | + disk and memory resources). Supports TensorRT, TensorFlow GraphDef, |
| 74 | + TensorFlow SavedModel and Caffe2 NetDef model formats. Also supports |
| 75 | + TensorFlow-TensorRT integrated models. |
| 76 | +* Multi-GPU support. TRTIS can distribute inferencing across all |
| 77 | + system GPUs. |
| 78 | +* `Concurrent model execution support |
| 79 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html?highlight=batching#instance-groups>`_. Multiple |
| 80 | + models (or multiple instances of the same model) can run |
| 81 | + simultaneously on the same GPU. |
| 82 | +* Batching support. For models that support batching, the server can |
| 83 | + accept requests for a batch of inputs and respond with the |
| 84 | + corresponding batch of outputs. TRTIS also supports `dynamic |
| 85 | + batching |
| 86 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html?highlight=batching#dynamic-batching>`_ |
| 87 | + where individual inference requests are dynamically combined |
| 88 | + together to improve inference throughput. Dynamic batching is |
| 89 | + transparent to the client requesting inference. |
| 90 | +* `Model repositories |
| 91 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_ |
| 92 | + may reside on a locally accessible file system (e.g. NFS) or in |
| 93 | + Google Cloud Storage. |
| 94 | +* Readiness and liveness `health endpoints |
| 95 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_ |
| 96 | + suitable for any orchestration or deployment framework, such as |
| 97 | + Kubernetes. |
| 98 | +* `Metrics |
| 99 | + <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_ |
| 100 | + indicating GPU utiliization, server throughput, and server latency. |
40 | 101 |
|
41 | 102 | .. overview-end-marker-do-not-remove
|
42 | 103 |
|
| 104 | +The current release of the TensorRT Inference Server is 0.9.0 beta and |
| 105 | +corresponds to the 18.12 release of the tensorrtserver container on |
| 106 | +`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for |
| 107 | +this release is `r18.12 |
| 108 | +<https://github.com/NVIDIA/tensorrt-inference-server/tree/r18.12>`_. |
| 109 | + |
| 110 | +Documentation |
| 111 | +------------- |
| 112 | + |
| 113 | +The User Guide, Developer Guide, and API Reference `documentation |
| 114 | +<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_ |
| 115 | +provide guidance on installing, building and running the latest |
| 116 | +release of the TensorRT Inference Server. |
| 117 | + |
| 118 | +You can also view the documentation for the `master branch |
| 119 | +<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_ |
| 120 | +and for `earlier releases |
| 121 | +<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_. |
| 122 | + |
| 123 | +The `Release Notes |
| 124 | +<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_ |
| 125 | +and `Support Matrix |
| 126 | +<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_ |
| 127 | +indicate the required versions of the NVIDIA Driver and CUDA, and also |
| 128 | +describe which GPUs are supported by TRTIS. |
| 129 | + |
| 130 | +Contributing |
| 131 | +------------ |
| 132 | + |
| 133 | +Contributions to TensorRT Inference Server are more than welcome. To |
| 134 | +contribute make a pull request and follow the guidelines outlined in |
| 135 | +the `Contributing <CONTRIBUTING.md>`_ document. |
| 136 | + |
| 137 | +Reporting problems, asking questions |
| 138 | +------------------------------------ |
| 139 | + |
| 140 | +We appreciate any feedback, questions or bug reporting regarding this |
| 141 | +project. When help with code is needed, follow the process outlined in |
| 142 | +the Stack Overflow (https://stackoverflow.com/help/mcve) |
| 143 | +document. Ensure posted examples are: |
| 144 | + |
| 145 | +* minimal – use as little code as possible that still produces the |
| 146 | + same problem |
| 147 | + |
| 148 | +* complete – provide all parts needed to reproduce the problem. Check |
| 149 | + if you can strip external dependency and still show the problem. The |
| 150 | + less time we spend on reproducing problems the more time we have to |
| 151 | + fix it |
| 152 | + |
| 153 | +* verifiable – test the code you're about to provide to make sure it |
| 154 | + reproduces the problem. Remove all other problems that are not |
| 155 | + related to your request/question. |
| 156 | + |
43 | 157 | .. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
|
44 | 158 | :target: https://opensource.org/licenses/BSD-3-Clause
|
0 commit comments