Skip to content

Commit 582287e

Browse files
author
David Goodwin
committed
Update README and versions for 19.07 release
1 parent b30036f commit 582287e

File tree

3 files changed

+217
-8
lines changed

3 files changed

+217
-8
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac
163163
############################################################################
164164
FROM ${BASE_IMAGE} AS trtserver_build
165165

166-
ARG TRTIS_VERSION=1.4.0dev
167-
ARG TRTIS_CONTAINER_VERSION=19.07dev
166+
ARG TRTIS_VERSION=1.4.0
167+
ARG TRTIS_CONTAINER_VERSION=19.07
168168

169169
# libgoogle-glog0v5 is needed by caffe2 libraries.
170170
RUN apt-get update && \
@@ -301,8 +301,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
301301
############################################################################
302302
FROM ${BASE_IMAGE}
303303

304-
ARG TRTIS_VERSION=1.4.0dev
305-
ARG TRTIS_CONTAINER_VERSION=19.07dev
304+
ARG TRTIS_VERSION=1.4.0
305+
ARG TRTIS_CONTAINER_VERSION=19.07
306306

307307
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
308308
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+212-3
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,222 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.07 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
33+
**NOTICE: The r19.07 branch has converted to using CMake
34+
to build the server, clients and other artifacts. Read the new
35+
documentation carefully to understand the new** `build process
36+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/build.html>`_.
3637

3738
.. overview-begin-marker-do-not-remove
3839
40+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
41+
solution optimized for NVIDIA GPUs. The server provides an inference
42+
service via an HTTP or GRPC endpoint, allowing remote clients to
43+
request inferencing for any model being managed by the server.
44+
45+
What's New In 1.4.0
46+
-------------------
47+
48+
* Added libtorch as a new backend. PyTorch models manually decorated
49+
or automatically traced to produce TorchScript can now be run
50+
directly by the inference server.
51+
52+
* Build system converted from bazel to CMake. The new CMake-based
53+
build system is more transparent, portable and modular.
54+
55+
* To simplify the creation of custom backends, a Custom Backend SDK
56+
and improved documentation is now available.
57+
58+
* Improved AsyncRun API in C++ and Python client libraries.
59+
60+
* perf_client can now use user-supplied input data (previously
61+
perf_client could only use random or zero input data).
62+
63+
* perf_client now reports latency at multiple confidence percentiles
64+
(p50, p90, p95, p99) as well as a user-supplied percentile that is
65+
also used to stabilize latency results.
66+
67+
* Improvements to automatic model configuration creation
68+
(-\\-strict-model-config=false).
69+
70+
* C++ and Python client libraries now allow additional HTTP headers to
71+
be specified when using the HTTP protocol.
72+
73+
Features
74+
--------
75+
76+
* `Multiple framework support
77+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
78+
server can manage any number and mix of models (limited by system
79+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
80+
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
81+
formats. Also supports TensorFlow-TensorRT integrated
82+
models. Variable-size input and output tensors are allowed if
83+
supported by the framework. See `Capabilities
84+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
85+
for detailed support information for each framework.
86+
87+
* `Concurrent model execution support
88+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
89+
models (or multiple instances of the same model) can run
90+
simultaneously on the same GPU.
91+
92+
* Batching support. For models that support batching, the server can
93+
accept requests for a batch of inputs and respond with the
94+
corresponding batch of outputs. The inference server also supports
95+
multiple `scheduling and batching
96+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
97+
algorithms that combine individual inference requests together to
98+
improve inference throughput. These scheduling and batching
99+
decisions are transparent to the client requesting inference.
100+
101+
* `Custom backend support
102+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
103+
allows individual models to be implemented with custom backends
104+
instead of by a deep-learning framework. With a custom backend a
105+
model can implement any logic desired, while still benefiting from
106+
the GPU support, concurrent execution, dynamic batching and other
107+
features provided by the server.
108+
109+
* `Ensemble support
110+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
111+
ensemble represents a pipeline of one or more models and the
112+
connection of input and output tensors between those models. A
113+
single inference request to an ensemble will trigger the execution
114+
of the entire pipeline.
115+
116+
* Multi-GPU support. The server can distribute inferencing across all
117+
system GPUs.
118+
119+
* The inference server `monitors the model repository
120+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
121+
for any change and dynamically reloads the model(s) when necessary,
122+
without requiring a server restart. Models and model versions can be
123+
added and removed, and model configurations can be modified while
124+
the server is running.
125+
126+
* `Model repositories
127+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
128+
may reside on a locally accessible file system (e.g. NFS) or in
129+
Google Cloud Storage.
130+
131+
* Readiness and liveness `health endpoints
132+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
133+
suitable for any orchestration or deployment framework, such as
134+
Kubernetes.
135+
136+
* `Metrics
137+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
138+
indicating GPU utilization, server throughput, and server latency.
139+
39140
.. overview-end-marker-do-not-remove
40141
142+
The current release of the TensorRT Inference Server is 1.4.0 and
143+
corresponds to the 19.07 release of the tensorrtserver container on
144+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
145+
this release is `r19.07
146+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.07>`_.
147+
148+
Backwards Compatibility
149+
-----------------------
150+
151+
Continuing in version 1.4.0 the following interfaces maintain
152+
backwards compatibilty with the 1.0.0 release. If you have model
153+
configuration files, custom backends, or clients that use the
154+
inference server HTTP or GRPC APIs (either directly or through the
155+
client libraries) from releases prior to 1.0.0 (19.03) you should edit
156+
and rebuild those as necessary to match the version 1.0.0 APIs.
157+
158+
These inferfaces will maintain backwards compatibility for all future
159+
1.x.y releases (see below for exceptions):
160+
161+
* Model configuration as defined in `model_config.proto
162+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
163+
164+
* The inference server HTTP and GRPC APIs as defined in `api.proto
165+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
166+
and `grpc_service.proto
167+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_.
168+
169+
* The custom backend interface as defined in `custom.h
170+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
171+
172+
As new features are introduced they may temporarily have beta status
173+
where they are subject to change in non-backwards-compatible
174+
ways. When they exit beta they will conform to the
175+
backwards-compatibility guarantees described above. Currently the
176+
following features are in beta:
177+
178+
* In the model configuration defined in `model_config.proto
179+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_
180+
the sections related to model ensembling are currently in beta. In
181+
particular, the ModelEnsembling message will potentially undergo
182+
non-backwards-compatible changes.
183+
184+
185+
Documentation
186+
-------------
187+
188+
The User Guide, Developer Guide, and API Reference `documentation
189+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
190+
provide guidance on installing, building and running the latest
191+
release of the TensorRT Inference Server.
192+
193+
You can also view the documentation for the `master branch
194+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
195+
and for `earlier releases
196+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
197+
198+
READMEs for deployment examples can be found in subdirectories of
199+
deploy/, for example, `deploy/single_server/README.rst
200+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
201+
202+
The `Release Notes
203+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
204+
and `Support Matrix
205+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
206+
indicate the required versions of the NVIDIA Driver and CUDA, and also
207+
describe which GPUs are supported by the inference server.
208+
209+
Other Documentation
210+
^^^^^^^^^^^^^^^^^^^
211+
212+
* `Maximizing Utilization for Data Center Inference with TensorRT
213+
Inference Server
214+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
215+
216+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
217+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
218+
219+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
220+
Inference Server and Kubeflow
221+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
222+
223+
Contributing
224+
------------
225+
226+
Contributions to TensorRT Inference Server are more than welcome. To
227+
contribute make a pull request and follow the guidelines outlined in
228+
the `Contributing <CONTRIBUTING.md>`_ document.
229+
230+
Reporting problems, asking questions
231+
------------------------------------
232+
233+
We appreciate any feedback, questions or bug reporting regarding this
234+
project. When help with code is needed, follow the process outlined in
235+
the Stack Overflow (https://stackoverflow.com/help/mcve)
236+
document. Ensure posted examples are:
237+
238+
* minimal – use as little code as possible that still produces the
239+
same problem
240+
241+
* complete – provide all parts needed to reproduce the problem. Check
242+
if you can strip external dependency and still show the problem. The
243+
less time we spend on reproducing problems the more time we have to
244+
fix it
245+
246+
* verifiable – test the code you're about to provide to make sure it
247+
reproduces the problem. Remove all other problems that are not
248+
related to your request/question.
249+
41250
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42251
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.4.0dev
1+
1.4.0

0 commit comments

Comments
 (0)