Skip to content

Commit 873e177

Browse files
committed
Update README and versions for 20.02 release
1 parent 54c409c commit 873e177

File tree

3 files changed

+235
-9
lines changed

3 files changed

+235
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -151,8 +151,8 @@ FROM ${TENSORFLOW_IMAGE} AS trtserver_tf
151151
############################################################################
152152
FROM ${BASE_IMAGE} AS trtserver_build
153153

154-
ARG TRTIS_VERSION=1.11.0dev
155-
ARG TRTIS_CONTAINER_VERSION=20.02dev
154+
ARG TRTIS_VERSION=1.11.0
155+
ARG TRTIS_CONTAINER_VERSION=20.02
156156

157157
# libgoogle-glog0v5 is needed by caffe2 libraries.
158158
# libcurl4-openSSL-dev is needed for GCS
@@ -334,8 +334,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
334334
############################################################################
335335
FROM ${BASE_IMAGE}
336336

337-
ARG TRTIS_VERSION=1.11.0dev
338-
ARG TRTIS_CONTAINER_VERSION=20.02dev
337+
ARG TRTIS_VERSION=1.11.0
338+
ARG TRTIS_CONTAINER_VERSION=20.02
339339

340340
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
341341
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+230-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,239 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the 20.02 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or GRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New in 1.11.0
41+
--------------------
42+
* The TensorRT backend is improved to have significantly better performance.
43+
Improvements include reducing thread contention, using pinned memory for
44+
faster CPU<->GPU transfers, and increasing compute and memory copy overlap
45+
on GPUs.
46+
47+
* Reduce memory usage of TensorRT models in many cases by sharing weights
48+
across multiple model instances.
49+
50+
* Boolean data-type and shape tensors are now supported for TensorRT models.
51+
52+
* A new model configuration option allows the dynamic batcher to create
53+
“ragged” batches for custom backend models. A ragged batch is a batch where
54+
one or more of the input/output tensors have different shapes in different
55+
batch entries.
56+
57+
* Local S3 storage endpoints are now supported for model repositories. A
58+
local S3 endpoint is specified as `s3://host:port/path/to/repository`.
59+
60+
* The Helm chart showing an example Kubernetes deployment is updated to
61+
include Prometheus and Grafana support so that inference server metrics can
62+
be collected and visualized.
63+
64+
* The inference server container no longer sets `LD_LIBRARY_PATH`, instead the
65+
server uses `RUNPATH` to locate its shared libraries.
66+
67+
* Python 2 is end-of-life so all support has been removed. Python 3 is still
68+
supported.
69+
70+
Features
71+
--------
72+
73+
* `Multiple framework support
74+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
75+
server can manage any number and mix of models (limited by system
76+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
77+
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
78+
formats. Also supports TensorFlow-TensorRT integrated
79+
models. Variable-size input and output tensors are allowed if
80+
supported by the framework. See `Capabilities
81+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
82+
for detailed support information for each framework.
83+
84+
* `Concurrent model execution support
85+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
86+
models (or multiple instances of the same model) can run
87+
simultaneously on the same GPU.
88+
89+
* Batching support. For models that support batching, the server can
90+
accept requests for a batch of inputs and respond with the
91+
corresponding batch of outputs. The inference server also supports
92+
multiple `scheduling and batching
93+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
94+
algorithms that combine individual inference requests together to
95+
improve inference throughput. These scheduling and batching
96+
decisions are transparent to the client requesting inference.
97+
98+
* `Custom backend support
99+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
100+
allows individual models to be implemented with custom backends
101+
instead of by a deep-learning framework. With a custom backend a
102+
model can implement any logic desired, while still benefiting from
103+
the GPU support, concurrent execution, dynamic batching and other
104+
features provided by the server.
105+
106+
* `Ensemble support
107+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
108+
ensemble represents a pipeline of one or more models and the
109+
connection of input and output tensors between those models. A
110+
single inference request to an ensemble will trigger the execution
111+
of the entire pipeline.
112+
113+
* Multi-GPU support. The server can distribute inferencing across all
114+
system GPUs.
115+
116+
* The inference server provides `multiple modes for model management
117+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_management.html>`_. These
118+
model management modes allow for both implicit and explicit loading
119+
and unloading of models without requiring a server restart.
120+
121+
* `Model repositories
122+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
123+
may reside on a locally accessible file system (e.g. NFS), in Google
124+
Cloud Storage or in Amazon S3.
125+
126+
* Readiness and liveness `health endpoints
127+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
128+
suitable for any orchestration or deployment framework, such as
129+
Kubernetes.
130+
131+
* `Metrics
132+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
133+
indicating GPU utilization, server throughput, and server latency.
134+
135+
* `C library inferface
136+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/library_api.html>`_
137+
allows the full functionality of the inference server to be included
138+
directly in an application.
139+
39140
.. overview-end-marker-do-not-remove
40141
142+
The current release of the TensorRT Inference Server is 1.11.0 and
143+
corresponds to the 20.02 release of the tensorrtserver container on
144+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
145+
this release is `r20.02
146+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r20.02>`_.
147+
148+
Backwards Compatibility
149+
-----------------------
150+
151+
Continuing in the latest version the following interfaces maintain
152+
backwards compatibilty with the 1.0.0 release. If you have model
153+
configuration files, custom backends, or clients that use the
154+
inference server HTTP or GRPC APIs (either directly or through the
155+
client libraries) from releases prior to 1.0.0 you should edit
156+
and rebuild those as necessary to match the version 1.0.0 APIs.
157+
158+
The following inferfaces will maintain backwards compatibility for all
159+
future 1.x.y releases (see below for exceptions):
160+
161+
* Model configuration as defined in `model_config.proto
162+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
163+
164+
* The inference server HTTP and GRPC APIs as defined in `api.proto
165+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
166+
and `grpc_service.proto
167+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_,
168+
except as noted below.
169+
170+
* The V1 custom backend interface as defined in `custom.h
171+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
172+
173+
As new features are introduced they may temporarily have beta status
174+
where they are subject to change in non-backwards-compatible
175+
ways. When they exit beta they will conform to the
176+
backwards-compatibility guarantees described above. Currently the
177+
following features are in beta:
178+
179+
* The inference server library API as defined in `trtserver.h
180+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/trtserver.h>`_
181+
is currently in beta and may undergo non-backwards-compatible
182+
changes.
183+
184+
* The inference server HTTP and GRPC APIs related to system and CUDA
185+
shared memory are currently in beta and may undergo
186+
non-backwards-compatible changes.
187+
188+
* The V2 custom backend interface as defined in `custom.h
189+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_
190+
is currently in beta and may undergo non-backwards-compatible
191+
changes.
192+
193+
* The C++ and Python client libraries are not stictly included in the
194+
inference server compatibility guarantees and so should be
195+
considered as beta status.
196+
197+
Documentation
198+
-------------
199+
200+
The User Guide, Developer Guide, and API Reference `documentation for
201+
the current release
202+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
203+
provide guidance on installing, building, and running the TensorRT
204+
Inference Server.
205+
206+
You can also view the `documentation for the master branch
207+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
208+
and for `earlier releases
209+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
210+
211+
An `FAQ
212+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/faq.html>`_
213+
provides answers for frequently asked questions.
214+
215+
READMEs for deployment examples can be found in subdirectories of
216+
deploy/, for example, `deploy/single_server/README.rst
217+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
218+
219+
The `Release Notes
220+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
221+
and `Support Matrix
222+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
223+
indicate the required versions of the NVIDIA Driver and CUDA, and also
224+
describe which GPUs are supported by the inference server.
225+
226+
Other Documentation
227+
^^^^^^^^^^^^^^^^^^^
228+
229+
* `Maximizing Utilization for Data Center Inference with TensorRT
230+
Inference Server
231+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
232+
233+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
234+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
235+
236+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
237+
Inference Server and Kubeflow
238+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
239+
240+
Contributing
241+
------------
242+
243+
Contributions to TensorRT Inference Server are more than welcome. To
244+
contribute make a pull request and follow the guidelines outlined in
245+
the `Contributing <CONTRIBUTING.md>`_ document.
246+
247+
Reporting problems, asking questions
248+
------------------------------------
249+
250+
We appreciate any feedback, questions or bug reporting regarding this
251+
project. When help with code is needed, follow the process outlined in
252+
the Stack Overflow (https://stackoverflow.com/help/mcve)
253+
document. Ensure posted examples are:
254+
255+
* minimal – use as little code as possible that still produces the
256+
same problem
257+
258+
* complete – provide all parts needed to reproduce the problem. Check
259+
if you can strip external dependency and still show the problem. The
260+
less time we spend on reproducing problems the more time we have to
261+
fix it
262+
263+
* verifiable – test the code you're about to provide to make sure it
264+
reproduces the problem. Remove all other problems that are not
265+
related to your request/question.
266+
41267
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42268
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.11.0dev
1+
1.11.0

0 commit comments

Comments
 (0)