Skip to content

Commit 76cea2f

Browse files
author
David Goodwin
committed
Update versions and documentation for release
1 parent 06970c8 commit 76cea2f

File tree

3 files changed

+123
-9
lines changed

3 files changed

+123
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ RUN bash -c 'if [ "$BUILD_CLIENTS_ONLY" != "1" ]; then \
9797
############################################################################
9898
FROM ${TENSORFLOW_IMAGE} AS trtserver_build
9999

100-
ARG TRTIS_VERSION=0.9.0dev
101-
ARG TRTIS_CONTAINER_VERSION=18.12dev
100+
ARG TRTIS_VERSION=0.9.0
101+
ARG TRTIS_CONTAINER_VERSION=18.12
102102
ARG PYVER=3.5
103103
ARG BUILD_CLIENTS_ONLY=0
104104

@@ -233,8 +233,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
233233
############################################################################
234234
FROM ${BASE_IMAGE}
235235

236-
ARG TRTIS_VERSION=0.9.0dev
237-
ARG TRTIS_CONTAINER_VERSION=18.12dev
236+
ARG TRTIS_VERSION=0.9.0
237+
ARG TRTIS_CONTAINER_VERSION=18.12
238238
ARG PYVER=3.5
239239

240240
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}

README.rst

+118-4
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,129 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33+
.. overview-begin-marker-do-not-remove
3334
34-
**NOTE: You are currently on the r18.12 branch which tracks
35-
stabilization towards the next release. This branch is not usable
36-
during stabilization.**
35+
The NVIDIA TensorRT Inference Server (TRTIS) provides a cloud
36+
inferencing solution optimized for NVIDIA GPUs. The server provides an
37+
inference service via an HTTP or GRPC endpoint, allowing remote
38+
clients to request inferencing for any model being managed by the
39+
server.
3740

38-
.. overview-begin-marker-do-not-remove
41+
What's New in 0.9.0 Beta
42+
------------------------
43+
44+
* TRTIS now `monitors the model repository
45+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
46+
for any change and dynamically reloads the model when necessary,
47+
without requiring a server restart. It is now possible to add and
48+
remove model versions, add/remove entire models, modify the model
49+
configuration, and modify the model labels while the server is
50+
running.
51+
* Added a `model priority
52+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#optimization-policy>`_
53+
parameter to the model configuration. Currently the model priority
54+
controls the CPU thread priority when executing the model and for
55+
TensorRT models also controls the CUDA stream priority.
56+
* Fixed a bug in GRPC API: changed the model version parameter from
57+
string to int. This is a non-backwards compatible change.
58+
* Added --strict-model-config=false option to allow some `model
59+
configuration properties to be derived automatically
60+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#generated-model-configuration>`_. For
61+
some model types, this removes the need to specify the config.pbtxt
62+
file.
63+
* Improved performance from an asynchronous GRPC frontend.
64+
65+
Features
66+
--------
67+
68+
TRTIS provides the following features:
3969

70+
* `Multiple framework support
71+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
72+
server can manage any number and mix of models (limited by system
73+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
74+
TensorFlow SavedModel and Caffe2 NetDef model formats. Also supports
75+
TensorFlow-TensorRT integrated models.
76+
* Multi-GPU support. TRTIS can distribute inferencing across all
77+
system GPUs.
78+
* `Concurrent model execution support
79+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html?highlight=batching#instance-groups>`_. Multiple
80+
models (or multiple instances of the same model) can run
81+
simultaneously on the same GPU.
82+
* Batching support. For models that support batching, the server can
83+
accept requests for a batch of inputs and respond with the
84+
corresponding batch of outputs. TRTIS also supports `dynamic
85+
batching
86+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html?highlight=batching#dynamic-batching>`_
87+
where individual inference requests are dynamically combined
88+
together to improve inference throughput. Dynamic batching is
89+
transparent to the client requesting inference.
90+
* `Model repositories
91+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
92+
may reside on a locally accessible file system (e.g. NFS) or in
93+
Google Cloud Storage.
94+
* Readiness and liveness `health endpoints
95+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
96+
suitable for any orchestration or deployment framework, such as
97+
Kubernetes.
98+
* `Metrics
99+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
100+
indicating GPU utiliization, server throughput, and server latency.
40101

41102
.. overview-end-marker-do-not-remove
42103
104+
The current release of the TensorRT Inference Server is 0.9.0 beta and
105+
corresponds to the 18.12 release of the tensorrtserver container on
106+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
107+
this release is `r18.12
108+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r18.12>`_.
109+
110+
Documentation
111+
-------------
112+
113+
The User Guide, Developer Guide, and API Reference `documentation
114+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
115+
provide guidance on installing, building and running the latest
116+
release of the TensorRT Inference Server.
117+
118+
You can also view the documentation for the `master branch
119+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
120+
and for `earlier releases
121+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
122+
123+
The `Release Notes
124+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
125+
and `Support Matrix
126+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
127+
indicate the required versions of the NVIDIA Driver and CUDA, and also
128+
describe which GPUs are supported by TRTIS.
129+
130+
Contributing
131+
------------
132+
133+
Contributions to TensorRT Inference Server are more than welcome. To
134+
contribute make a pull request and follow the guidelines outlined in
135+
the `Contributing <CONTRIBUTING.md>`_ document.
136+
137+
Reporting problems, asking questions
138+
------------------------------------
139+
140+
We appreciate any feedback, questions or bug reporting regarding this
141+
project. When help with code is needed, follow the process outlined in
142+
the Stack Overflow (https://stackoverflow.com/help/mcve)
143+
document. Ensure posted examples are:
144+
145+
* minimal – use as little code as possible that still produces the
146+
same problem
147+
148+
* complete – provide all parts needed to reproduce the problem. Check
149+
if you can strip external dependency and still show the problem. The
150+
less time we spend on reproducing problems the more time we have to
151+
fix it
152+
153+
* verifiable – test the code you're about to provide to make sure it
154+
reproduces the problem. Remove all other problems that are not
155+
related to your request/question.
156+
43157
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
44158
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.9.0dev
1+
0.9.0

0 commit comments

Comments
 (0)