Skip to content

Commit a1f3860

Browse files
committed
Update README and versions for 19.12 release
1 parent 6358a84 commit a1f3860

File tree

3 files changed

+239
-9
lines changed

3 files changed

+239
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -192,8 +192,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac
192192
############################################################################
193193
FROM ${BASE_IMAGE} AS trtserver_build
194194

195-
ARG TRTIS_VERSION=1.9.0dev
196-
ARG TRTIS_CONTAINER_VERSION=19.12dev
195+
ARG TRTIS_VERSION=1.9.0
196+
ARG TRTIS_CONTAINER_VERSION=19.12
197197

198198
# libgoogle-glog0v5 is needed by caffe2 libraries.
199199
# libcurl4-openSSL-dev is needed for GCS
@@ -348,8 +348,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
348348
############################################################################
349349
FROM ${BASE_IMAGE}
350350

351-
ARG TRTIS_VERSION=1.9.0dev
352-
ARG TRTIS_CONTAINER_VERSION=19.12dev
351+
ARG TRTIS_VERSION=1.9.0
352+
ARG TRTIS_CONTAINER_VERSION=19.12
353353

354354
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
355355
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+234-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,243 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.12 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or GRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New in 1.9.0
41+
-------------------
42+
* The model configuration now includes a model warmup option. This option
43+
provides the ability to tune and optimize the model before inference requests
44+
are received, avoiding initial inference delays. This option is especially
45+
useful for frameworks like TensorFlow that perform network optimization in
46+
response to the initial inference requests. Models can be warmed-up with one
47+
or more synthetic or realistic workloads before they become ready in the
48+
server.
49+
50+
* An enhanced sequence batcher now has multiple scheduling strategies. A new
51+
Oldest strategy integrates with the dynamic batcher to enable improved
52+
inference performance for models that don’t require all inference requests
53+
in a sequence to be routed to the same batch slot.
54+
55+
* The perf_client now has an option to generate requests using a realistic
56+
poisson distribution or a user provided distribution.
57+
58+
* A new repository API (available in the shared library API, HTTP, and GRPC)
59+
returns an index of all models available in the model repositories) visible
60+
to the server. This index can be used to see what models are available for
61+
loading onto the server.
62+
63+
* The server status returned by the server status API now includes the
64+
timestamp of the last inference request received for each model.
65+
66+
* Inference server tracing capabilities are now documented in the `Optimization
67+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/optimization.html>`_
68+
section of the User Guide. Tracing support is enhanced to provide trace for
69+
ensembles and the contained models.
70+
71+
* A community contributed Dockerfile is now available to build the TensorRT
72+
Inference Server clients on CentOS.
73+
74+
Features
75+
--------
76+
77+
* `Multiple framework support
78+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
79+
server can manage any number and mix of models (limited by system
80+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
81+
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
82+
formats. Also supports TensorFlow-TensorRT integrated
83+
models. Variable-size input and output tensors are allowed if
84+
supported by the framework. See `Capabilities
85+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
86+
for detailed support information for each framework.
87+
88+
* `Concurrent model execution support
89+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
90+
models (or multiple instances of the same model) can run
91+
simultaneously on the same GPU.
92+
93+
* Batching support. For models that support batching, the server can
94+
accept requests for a batch of inputs and respond with the
95+
corresponding batch of outputs. The inference server also supports
96+
multiple `scheduling and batching
97+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
98+
algorithms that combine individual inference requests together to
99+
improve inference throughput. These scheduling and batching
100+
decisions are transparent to the client requesting inference.
101+
102+
* `Custom backend support
103+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
104+
allows individual models to be implemented with custom backends
105+
instead of by a deep-learning framework. With a custom backend a
106+
model can implement any logic desired, while still benefiting from
107+
the GPU support, concurrent execution, dynamic batching and other
108+
features provided by the server.
109+
110+
* `Ensemble support
111+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
112+
ensemble represents a pipeline of one or more models and the
113+
connection of input and output tensors between those models. A
114+
single inference request to an ensemble will trigger the execution
115+
of the entire pipeline.
116+
117+
* Multi-GPU support. The server can distribute inferencing across all
118+
system GPUs.
119+
120+
* The inference server provides `multiple modes for model management
121+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_management.html>`_. These
122+
model management modes allow for both implicit and explicit loading
123+
and unloading of models without requiring a server restart.
124+
125+
* `Model repositories
126+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
127+
may reside on a locally accessible file system (e.g. NFS), in Google
128+
Cloud Storage or in Amazon S3.
129+
130+
* Readiness and liveness `health endpoints
131+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
132+
suitable for any orchestration or deployment framework, such as
133+
Kubernetes.
134+
135+
* `Metrics
136+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
137+
indicating GPU utilization, server throughput, and server latency.
138+
139+
* `C library inferface
140+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/library_api.html>`_
141+
allows the full functionality of the inference server to be included
142+
directly in an application.
143+
39144
.. overview-end-marker-do-not-remove
40145
146+
The current release of the TensorRT Inference Server is 1.9.0 and
147+
corresponds to the 19.12 release of the tensorrtserver container on
148+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
149+
this release is `r19.12
150+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.12>`_.
151+
152+
Backwards Compatibility
153+
-----------------------
154+
155+
Continuing in the latest version the following interfaces maintain
156+
backwards compatibilty with the 1.0.0 release. If you have model
157+
configuration files, custom backends, or clients that use the
158+
inference server HTTP or GRPC APIs (either directly or through the
159+
client libraries) from releases prior to 1.0.0 you should edit
160+
and rebuild those as necessary to match the version 1.0.0 APIs.
161+
162+
The following inferfaces will maintain backwards compatibility for all
163+
future 1.x.y releases (see below for exceptions):
164+
165+
* Model configuration as defined in `model_config.proto
166+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
167+
168+
* The inference server HTTP and GRPC APIs as defined in `api.proto
169+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
170+
and `grpc_service.proto
171+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_,
172+
except as noted below.
173+
174+
* The V1 custom backend interface as defined in `custom.h
175+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
176+
177+
As new features are introduced they may temporarily have beta status
178+
where they are subject to change in non-backwards-compatible
179+
ways. When they exit beta they will conform to the
180+
backwards-compatibility guarantees described above. Currently the
181+
following features are in beta:
182+
183+
* The inference server library API as defined in `trtserver.h
184+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/trtserver.h>`_
185+
is currently in beta and may undergo non-backwards-compatible
186+
changes.
187+
188+
* The inference server HTTP and GRPC APIs related to system and CUDA
189+
shared memory are currently in beta and may undergo
190+
non-backwards-compatible changes.
191+
192+
* The V2 custom backend interface as defined in `custom.h
193+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_
194+
is currently in beta and may undergo non-backwards-compatible
195+
changes.
196+
197+
* The C++ and Python client libraries are not stictly included in the
198+
inference server compatibility guarantees and so should be
199+
considered as beta status.
200+
201+
Documentation
202+
-------------
203+
204+
The User Guide, Developer Guide, and API Reference `documentation for
205+
the current release
206+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
207+
provide guidance on installing, building, and running the TensorRT
208+
Inference Server.
209+
210+
You can also view the `documentation for the master branch
211+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
212+
and for `earlier releases
213+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
214+
215+
An `FAQ
216+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/faq.html>`_
217+
provides answers for frequently asked questions.
218+
219+
READMEs for deployment examples can be found in subdirectories of
220+
deploy/, for example, `deploy/single_server/README.rst
221+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
222+
223+
The `Release Notes
224+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
225+
and `Support Matrix
226+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
227+
indicate the required versions of the NVIDIA Driver and CUDA, and also
228+
describe which GPUs are supported by the inference server.
229+
230+
Other Documentation
231+
^^^^^^^^^^^^^^^^^^^
232+
233+
* `Maximizing Utilization for Data Center Inference with TensorRT
234+
Inference Server
235+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
236+
237+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
238+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
239+
240+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
241+
Inference Server and Kubeflow
242+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
243+
244+
Contributing
245+
------------
246+
247+
Contributions to TensorRT Inference Server are more than welcome. To
248+
contribute make a pull request and follow the guidelines outlined in
249+
the `Contributing <CONTRIBUTING.md>`_ document.
250+
251+
Reporting problems, asking questions
252+
------------------------------------
253+
254+
We appreciate any feedback, questions or bug reporting regarding this
255+
project. When help with code is needed, follow the process outlined in
256+
the Stack Overflow (https://stackoverflow.com/help/mcve)
257+
document. Ensure posted examples are:
258+
259+
* minimal – use as little code as possible that still produces the
260+
same problem
261+
262+
* complete – provide all parts needed to reproduce the problem. Check
263+
if you can strip external dependency and still show the problem. The
264+
less time we spend on reproducing problems the more time we have to
265+
fix it
266+
267+
* verifiable – test the code you're about to provide to make sure it
268+
reproduces the problem. Remove all other problems that are not
269+
related to your request/question.
270+
41271
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42272
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.9.0dev
1+
1.9.0

0 commit comments

Comments
 (0)