Skip to content

Commit db56b7b

Browse files
author
David Goodwin
committed
Update for 1.1.0 / 19.04 release
1 parent bbd8ec8 commit db56b7b

File tree

3 files changed

+188
-9
lines changed

3 files changed

+188
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ RUN cd pytorch && \
7979
############################################################################
8080
FROM ${TENSORFLOW_IMAGE} AS trtserver_build
8181

82-
ARG TRTIS_VERSION=1.1.0dev
83-
ARG TRTIS_CONTAINER_VERSION=19.04dev
82+
ARG TRTIS_VERSION=1.1.0
83+
ARG TRTIS_CONTAINER_VERSION=19.04
8484
ARG PYVER=3.5
8585

8686
# The TFServing release branch must match the TF release used by
@@ -200,8 +200,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
200200
############################################################################
201201
FROM ${BASE_IMAGE}
202202

203-
ARG TRTIS_VERSION=1.1.0dev
204-
ARG TRTIS_CONTAINER_VERSION=19.04dev
203+
ARG TRTIS_VERSION=1.1.0
204+
ARG TRTIS_CONTAINER_VERSION=19.04
205205
ARG PYVER=3.5
206206

207207
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}

README.rst

+183-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,192 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.04 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or gRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 1.1.0
41+
-------------------
42+
43+
* Client libraries and examples now build with a separate Makefile (a
44+
Dockerfile is also included for convenience).
45+
46+
* Input or output tensors with variable-size dimensions (indicated
47+
by -1 in the model configuration) can now represent tensors where
48+
the variable dimension has value 0 (zero).
49+
50+
* Zero-sized input and output tensors are now supported for batching
51+
models. This enables the inference server to support models that
52+
require inputs and outputs that have shape [ batch-size ].
53+
54+
* TensorFlow custom operations (C++) can now be built into the
55+
inference server. An example and documentation are included in this
56+
release.
57+
58+
Features
59+
--------
60+
61+
* `Multiple framework support
62+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
63+
server can manage any number and mix of models (limited by system
64+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
65+
TensorFlow SavedModel and Caffe2 NetDef model formats. Also supports
66+
TensorFlow-TensorRT integrated models. Variable-size input and
67+
output tensors are allowed if supported by the framework. See
68+
`Capabilities
69+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
70+
for detailed support information for each framework.
71+
72+
* `Concurrent model execution support
73+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
74+
models (or multiple instances of the same model) can run
75+
simultaneously on the same GPU.
76+
77+
* Batching support. For models that support batching, the server can
78+
accept requests for a batch of inputs and respond with the
79+
corresponding batch of outputs. The inference server also supports
80+
multiple `scheduling and batching
81+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
82+
algorithms that combine individual inference requests together to
83+
improve inference throughput. These scheduling and batching
84+
decisions are transparent to the client requesting inference.
85+
86+
* `Custom backend support
87+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
88+
allows individual models to be implemented with custom backends
89+
instead of by a deep-learning framework. With a custom backend a
90+
model can implement any logic desired, while still benefiting from
91+
the GPU support, concurrent execution, dynamic batching and other
92+
features provided by the server.
93+
94+
* Multi-GPU support. The server can distribute inferencing across all
95+
system GPUs.
96+
97+
* The inference server `monitors the model repository
98+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
99+
for any change and dynamically reloads the model(s) when necessary,
100+
without requiring a server restart. Models and model versions can be
101+
added and removed, and model configurations can be modified while
102+
the server is running.
103+
104+
* `Model repositories
105+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
106+
may reside on a locally accessible file system (e.g. NFS) or in
107+
Google Cloud Storage.
108+
109+
* Readiness and liveness `health endpoints
110+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
111+
suitable for any orchestration or deployment framework, such as
112+
Kubernetes.
113+
114+
* `Metrics
115+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
116+
indicating GPU utiliization, server throughput, and server latency.
117+
39118
.. overview-end-marker-do-not-remove
40119
120+
The current release of the TensorRT Inference Server is 1.1.0 and
121+
corresponds to the 19.04 release of the tensorrtserver container on
122+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
123+
this release is `r19.04
124+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.04>`_.
125+
126+
Backwards Compatibility
127+
-----------------------
128+
129+
Continuing in version 1.1.0 the following interfaces maintain
130+
backwards compatibilty with the 1.0.0 release. If you have model
131+
configuration files, custom backends, or clients that use the
132+
inference server HTTP or GRPC APIs (either directly or through the
133+
client libraries) from releases prior to 1.0.0 (19.03) you should edit
134+
and rebuild those as necessary to match the version 1.0.0 APIs.
135+
136+
These inferfaces will maintain backwards compatibility for all future
137+
1.x.y releases (see below for exceptions):
138+
139+
* Model configuration as defined in `model_config.proto
140+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
141+
142+
* The inference server HTTP and GRPC APIs as defined in `api.proto
143+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
144+
and `grpc_service.proto
145+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_.
146+
147+
* The custom backend interface as defined in `custom.h
148+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/servables/custom/custom.h>`_.
149+
150+
As new features are introduced they may temporarily have beta status
151+
where they are subject to change in non-backwards-compatible
152+
ways. When they exit beta they will conform to the
153+
backwards-compatibility guarantees described above. Currently the
154+
following features are in beta:
155+
156+
* In the model configuration defined in `model_config.proto
157+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_
158+
the sections related to model ensembling are currently in beta. In
159+
particular, the ModelEnsembling message will potentially undergo
160+
non-backwards-compatible changes.
161+
162+
163+
Documentation
164+
-------------
165+
166+
The User Guide, Developer Guide, and API Reference `documentation
167+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
168+
provide guidance on installing, building and running the latest
169+
release of the TensorRT Inference Server.
170+
171+
You can also view the documentation for the `master branch
172+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
173+
and for `earlier releases
174+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
175+
176+
The `Release Notes
177+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
178+
and `Support Matrix
179+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
180+
indicate the required versions of the NVIDIA Driver and CUDA, and also
181+
describe which GPUs are supported by the inference server.
182+
183+
Blog Posts
184+
^^^^^^^^^^
185+
186+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
187+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
188+
189+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
190+
Inference Server and Kubeflow
191+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
192+
193+
Contributing
194+
------------
195+
196+
Contributions to TensorRT Inference Server are more than welcome. To
197+
contribute make a pull request and follow the guidelines outlined in
198+
the `Contributing <CONTRIBUTING.md>`_ document.
199+
200+
Reporting problems, asking questions
201+
------------------------------------
202+
203+
We appreciate any feedback, questions or bug reporting regarding this
204+
project. When help with code is needed, follow the process outlined in
205+
the Stack Overflow (https://stackoverflow.com/help/mcve)
206+
document. Ensure posted examples are:
207+
208+
* minimal – use as little code as possible that still produces the
209+
same problem
210+
211+
* complete – provide all parts needed to reproduce the problem. Check
212+
if you can strip external dependency and still show the problem. The
213+
less time we spend on reproducing problems the more time we have to
214+
fix it
215+
216+
* verifiable – test the code you're about to provide to make sure it
217+
reproduces the problem. Remove all other problems that are not
218+
related to your request/question.
219+
41220
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42221
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.1.0dev
1+
1.1.0

0 commit comments

Comments
 (0)