Skip to content

Commit b0ffafc

Browse files
committed
Update README and versions for 19.11 release
1 parent 3da4aee commit b0ffafc

File tree

3 files changed

+223
-9
lines changed

3 files changed

+223
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -190,8 +190,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac
190190
############################################################################
191191
FROM ${BASE_IMAGE} AS trtserver_build
192192

193-
ARG TRTIS_VERSION=1.8.0dev
194-
ARG TRTIS_CONTAINER_VERSION=19.11dev
193+
ARG TRTIS_VERSION=1.8.0
194+
ARG TRTIS_CONTAINER_VERSION=19.11
195195

196196
# libgoogle-glog0v5 is needed by caffe2 libraries.
197197
# libcurl4-openSSL-dev is needed for GCS
@@ -346,8 +346,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
346346
############################################################################
347347
FROM ${BASE_IMAGE}
348348

349-
ARG TRTIS_VERSION=1.8.0dev
350-
ARG TRTIS_CONTAINER_VERSION=19.11dev
349+
ARG TRTIS_VERSION=1.8.0
350+
ARG TRTIS_CONTAINER_VERSION=19.11
351351

352352
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
353353
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+218-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,227 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.11 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or GRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 1.8.0
41+
-------------------
42+
43+
* Shared-memory support is expanded to include CUDA shared memory.
44+
45+
* Improve efficiency of pinned-memory used for ensemble models.
46+
47+
* The perf_client application has been improved with easier-to-use
48+
command-line arguments (which maintaining compatibility with existing
49+
arguments).
50+
51+
* Support for string tensors added to perf_client.
52+
53+
* Documentation contains a new “Optimization” section discussing some common
54+
optimization strategies and how to use perf_client to explore these
55+
strategies.
56+
57+
58+
Features
59+
--------
60+
61+
* `Multiple framework support
62+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
63+
server can manage any number and mix of models (limited by system
64+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
65+
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
66+
formats. Also supports TensorFlow-TensorRT integrated
67+
models. Variable-size input and output tensors are allowed if
68+
supported by the framework. See `Capabilities
69+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
70+
for detailed support information for each framework.
71+
72+
* `Concurrent model execution support
73+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
74+
models (or multiple instances of the same model) can run
75+
simultaneously on the same GPU.
76+
77+
* Batching support. For models that support batching, the server can
78+
accept requests for a batch of inputs and respond with the
79+
corresponding batch of outputs. The inference server also supports
80+
multiple `scheduling and batching
81+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
82+
algorithms that combine individual inference requests together to
83+
improve inference throughput. These scheduling and batching
84+
decisions are transparent to the client requesting inference.
85+
86+
* `Custom backend support
87+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
88+
allows individual models to be implemented with custom backends
89+
instead of by a deep-learning framework. With a custom backend a
90+
model can implement any logic desired, while still benefiting from
91+
the GPU support, concurrent execution, dynamic batching and other
92+
features provided by the server.
93+
94+
* `Ensemble support
95+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
96+
ensemble represents a pipeline of one or more models and the
97+
connection of input and output tensors between those models. A
98+
single inference request to an ensemble will trigger the execution
99+
of the entire pipeline.
100+
101+
* Multi-GPU support. The server can distribute inferencing across all
102+
system GPUs.
103+
104+
* The inference server provides `multiple modes for model management
105+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_management.html>`_. These
106+
model management modes allow for both implicit and explicit loading
107+
and unloading of models without requiring a server restart.
108+
109+
* `Model repositories
110+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
111+
may reside on a locally accessible file system (e.g. NFS), in Google
112+
Cloud Storage or in Amazon S3.
113+
114+
* Readiness and liveness `health endpoints
115+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
116+
suitable for any orchestration or deployment framework, such as
117+
Kubernetes.
118+
119+
* `Metrics
120+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
121+
indicating GPU utilization, server throughput, and server latency.
122+
123+
* `C library inferface
124+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/library_api.html>`_
125+
allows the full functionality of the inference server to be included
126+
directly in an application.
127+
39128
.. overview-end-marker-do-not-remove
40129
130+
The current release of the TensorRT Inference Server is 1.8.0 and
131+
corresponds to the 19.11 release of the tensorrtserver container on
132+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
133+
this release is `r19.11
134+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.11>`_.
135+
136+
Backwards Compatibility
137+
-----------------------
138+
139+
Continuing in the latest version the following interfaces maintain
140+
backwards compatibilty with the 1.0.0 release. If you have model
141+
configuration files, custom backends, or clients that use the
142+
inference server HTTP or GRPC APIs (either directly or through the
143+
client libraries) from releases prior to 1.0.0 you should edit
144+
and rebuild those as necessary to match the version 1.0.0 APIs.
145+
146+
The following inferfaces will maintain backwards compatibility for all
147+
future 1.x.y releases (see below for exceptions):
148+
149+
* Model configuration as defined in `model_config.proto
150+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
151+
152+
* The inference server HTTP and GRPC APIs as defined in `api.proto
153+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
154+
and `grpc_service.proto
155+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_,
156+
except as noted below.
157+
158+
* The V1 custom backend interface as defined in `custom.h
159+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
160+
161+
As new features are introduced they may temporarily have beta status
162+
where they are subject to change in non-backwards-compatible
163+
ways. When they exit beta they will conform to the
164+
backwards-compatibility guarantees described above. Currently the
165+
following features are in beta:
166+
167+
* The inference server library API as defined in `trtserver.h
168+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/trtserver.h>`_
169+
is currently in beta and may undergo non-backwards-compatible
170+
changes.
171+
172+
* The inference server HTTP and GRPC APIs related to system and CUDA
173+
shared memory are currently in beta and may undergo
174+
non-backwards-compatible changes.
175+
176+
* The V2 custom backend interface as defined in `custom.h
177+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_
178+
is currently in beta and may undergo non-backwards-compatible
179+
changes.
180+
181+
* The C++ and Python client libraries are not stictly included in the
182+
inference server compatibility guarantees and so should be
183+
considered as beta status.
184+
185+
Documentation
186+
-------------
187+
188+
The User Guide, Developer Guide, and API Reference `documentation for
189+
the current release
190+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
191+
provide guidance on installing, building, and running the TensorRT
192+
Inference Server.
193+
194+
You can also view the `documentation for the master branch
195+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
196+
and for `earlier releases
197+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
198+
199+
An `FAQ
200+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/faq.html>`_
201+
provides answers for frequently asked questions.
202+
203+
READMEs for deployment examples can be found in subdirectories of
204+
deploy/, for example, `deploy/single_server/README.rst
205+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
206+
207+
The `Release Notes
208+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
209+
and `Support Matrix
210+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
211+
indicate the required versions of the NVIDIA Driver and CUDA, and also
212+
describe which GPUs are supported by the inference server.
213+
214+
Other Documentation
215+
^^^^^^^^^^^^^^^^^^^
216+
217+
* `Maximizing Utilization for Data Center Inference with TensorRT
218+
Inference Server
219+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
220+
221+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
222+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
223+
224+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
225+
Inference Server and Kubeflow
226+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
227+
228+
Contributing
229+
------------
230+
231+
Contributions to TensorRT Inference Server are more than welcome. To
232+
contribute make a pull request and follow the guidelines outlined in
233+
the `Contributing <CONTRIBUTING.md>`_ document.
234+
235+
Reporting problems, asking questions
236+
------------------------------------
237+
238+
We appreciate any feedback, questions or bug reporting regarding this
239+
project. When help with code is needed, follow the process outlined in
240+
the Stack Overflow (https://stackoverflow.com/help/mcve)
241+
document. Ensure posted examples are:
242+
243+
* minimal – use as little code as possible that still produces the
244+
same problem
245+
246+
* complete – provide all parts needed to reproduce the problem. Check
247+
if you can strip external dependency and still show the problem. The
248+
less time we spend on reproducing problems the more time we have to
249+
fix it
250+
251+
* verifiable – test the code you're about to provide to make sure it
252+
reproduces the problem. Remove all other problems that are not
253+
related to your request/question.
254+
41255
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42256
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.8.0dev
1+
1.8.0

0 commit comments

Comments
 (0)