Skip to content

Commit bdfe25a

Browse files
authored
Update README and versions for 19.09 release (#685)
Update README and versions for 19.09 release
1 parent b63adb5 commit bdfe25a

File tree

3 files changed

+221
-9
lines changed

3 files changed

+221
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -165,8 +165,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac
165165
############################################################################
166166
FROM ${BASE_IMAGE} AS trtserver_build
167167

168-
ARG TRTIS_VERSION=1.6.0dev
169-
ARG TRTIS_CONTAINER_VERSION=19.09dev
168+
ARG TRTIS_VERSION=1.6.0
169+
ARG TRTIS_CONTAINER_VERSION=19.09
170170

171171
# libgoogle-glog0v5 is needed by caffe2 libraries.
172172
RUN apt-get update && \
@@ -299,8 +299,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
299299
############################################################################
300300
FROM ${BASE_IMAGE}
301301

302-
ARG TRTIS_VERSION=1.6.0dev
303-
ARG TRTIS_CONTAINER_VERSION=19.09dev
302+
ARG TRTIS_VERSION=1.6.0
303+
ARG TRTIS_CONTAINER_VERSION=19.09
304304

305305
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
306306
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+216-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,225 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.09 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or gRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 1.6.0
41+
-------------------
42+
43+
* Added TensorRT 6 support, which includes support for TensorRT dynamic
44+
shapes.
45+
46+
* Shared memory support is added as an alpha feature in this release. This
47+
support allows input and output tensors to be communicated via shared
48+
memory instead of over the network. Currently only system (CPU) shared
49+
memory is supported.
50+
51+
* Amazon S3 is now supported as a remote file system for model repositories.
52+
Use the s3:// prefix on model repository paths to reference S3 locations.
53+
54+
* The inference server library API is available as a beta in this release.
55+
The library API allows you to link against libtrtserver.so so that you can
56+
include all the inference server functionality directly in your application.
57+
58+
* GRPC endpoint performance improvement. The inference server’s GRPC endpoint
59+
now uses significantly less memory while delivering higher performance.
60+
61+
* The ensemble scheduler is now more flexible in allowing batching and
62+
non-batching models to be composed together in an ensemble.
63+
64+
* The ensemble scheduler will now keep tensors in GPU memory between models
65+
when possible. Doing so significantly increases performance of some ensembles
66+
by avoiding copies to and from system memory.
67+
68+
* The performance client, perf_client, now supports models with variable-sized
69+
input tensors.
70+
71+
Features
72+
--------
73+
74+
* `Multiple framework support
75+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
76+
server can manage any number and mix of models (limited by system
77+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
78+
TensorFlow SavedModel, ONNX and Caffe2 NetDef model formats. Also
79+
supports TensorFlow-TensorRT integrated models. Variable-size input
80+
and output tensors are allowed if supported by the framework. See
81+
`Capabilities
82+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
83+
for detailed support information for each framework.
84+
85+
* `Concurrent model execution support
86+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
87+
models (or multiple instances of the same model) can run
88+
simultaneously on the same GPU.
89+
90+
* Batching support. For models that support batching, the server can
91+
accept requests for a batch of inputs and respond with the
92+
corresponding batch of outputs. The inference server also supports
93+
multiple `scheduling and batching
94+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
95+
algorithms that combine individual inference requests together to
96+
improve inference throughput. These scheduling and batching
97+
decisions are transparent to the client requesting inference.
98+
99+
* `Custom backend support
100+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
101+
allows individual models to be implemented with custom backends
102+
instead of by a deep-learning framework. With a custom backend a
103+
model can implement any logic desired, while still benefiting from
104+
the GPU support, concurrent execution, dynamic batching and other
105+
features provided by the server.
106+
107+
* `Ensemble support
108+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
109+
ensemble represents a pipeline of one or more models and the
110+
connection of input and output tensors between those models. A
111+
single inference request to an ensemble will trigger the execution
112+
of the entire pipeline.
113+
114+
* Multi-GPU support. The server can distribute inferencing across all
115+
system GPUs.
116+
117+
* The inference server `monitors the model repository
118+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
119+
for any change and dynamically reloads the model(s) when necessary,
120+
without requiring a server restart. Models and model versions can be
121+
added and removed, and model configurations can be modified while
122+
the server is running.
123+
124+
* `Model repositories
125+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
126+
may reside on a locally accessible file system (e.g. NFS), in Google
127+
Cloud Storage or in Amazon S3.
128+
129+
* Readiness and liveness `health endpoints
130+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
131+
suitable for any orchestration or deployment framework, such as
132+
Kubernetes.
133+
134+
* `Metrics
135+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
136+
indicating GPU utilization, server throughput, and server latency.
137+
138+
* `C library inferface
139+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/library_api.html>`_
140+
allows the full functionality of the inference server to be included
141+
directly in an application.
142+
39143
.. overview-end-marker-do-not-remove
40144
145+
The current release of the TensorRT Inference Server is 1.6.0 and
146+
corresponds to the 19.09 release of the tensorrtserver container on
147+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
148+
this release is `r19.09
149+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.09>`_.
150+
151+
Backwards Compatibility
152+
-----------------------
153+
154+
Continuing in version 1.6.0 the following interfaces maintain
155+
backwards compatibilty with the 1.0.0 release. If you have model
156+
configuration files, custom backends, or clients that use the
157+
inference server HTTP or GRPC APIs (either directly or through the
158+
client libraries) from releases prior to 1.0.0 (19.03) you should edit
159+
and rebuild those as necessary to match the version 1.0.0 APIs.
160+
161+
These inferfaces will maintain backwards compatibility for all future
162+
1.x.y releases (see below for exceptions):
163+
164+
* Model configuration as defined in `model_config.proto
165+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
166+
167+
* The inference server HTTP and GRPC APIs as defined in `api.proto
168+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
169+
and `grpc_service.proto
170+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_.
171+
172+
* The custom backend interface as defined in `custom.h
173+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
174+
175+
As new features are introduced they may temporarily have beta status
176+
where they are subject to change in non-backwards-compatible
177+
ways. When they exit beta they will conform to the
178+
backwards-compatibility guarantees described above. Currently the
179+
following features are in beta:
180+
181+
* In the model configuration defined in `model_config.proto
182+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_
183+
the sections related to model ensembling are currently in beta. In
184+
particular, the ModelEnsembling message will potentially undergo
185+
non-backwards-compatible changes.
186+
187+
188+
Documentation
189+
-------------
190+
191+
The User Guide, Developer Guide, and API Reference `documentation
192+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
193+
provide guidance on installing, building and running the latest
194+
release of the TensorRT Inference Server.
195+
196+
You can also view the documentation for the `master branch
197+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
198+
and for `earlier releases
199+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
200+
201+
READMEs for deployment examples can be found in subdirectories of
202+
deploy/, for example, `deploy/single_server/README.rst
203+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
204+
205+
The `Release Notes
206+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
207+
and `Support Matrix
208+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
209+
indicate the required versions of the NVIDIA Driver and CUDA, and also
210+
describe which GPUs are supported by the inference server.
211+
212+
Other Documentation
213+
^^^^^^^^^^^^^^^^^^^
214+
215+
* `Maximizing Utilization for Data Center Inference with TensorRT
216+
Inference Server
217+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
218+
219+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
220+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
221+
222+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
223+
Inference Server and Kubeflow
224+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
225+
226+
Contributing
227+
------------
228+
229+
Contributions to TensorRT Inference Server are more than welcome. To
230+
contribute make a pull request and follow the guidelines outlined in
231+
the `Contributing <CONTRIBUTING.md>`_ document.
232+
233+
Reporting problems, asking questions
234+
------------------------------------
235+
236+
We appreciate any feedback, questions or bug reporting regarding this
237+
project. When help with code is needed, follow the process outlined in
238+
the Stack Overflow (https://stackoverflow.com/help/mcve)
239+
document. Ensure posted examples are:
240+
241+
* minimal – use as little code as possible that still produces the
242+
same problem
243+
244+
* complete – provide all parts needed to reproduce the problem. Check
245+
if you can strip external dependency and still show the problem. The
246+
less time we spend on reproducing problems the more time we have to
247+
fix it
248+
249+
* verifiable – test the code you're about to provide to make sure it
250+
reproduces the problem. Remove all other problems that are not
251+
related to your request/question.
252+
41253
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42254
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.6.0dev
1+
1.6.0

0 commit comments

Comments
 (0)