Skip to content

Commit cfde288

Browse files
author
David Goodwin
committed
Update for 0.11.0/19.02 release
1 parent 67c875a commit cfde288

File tree

5 files changed

+189
-21
lines changed

5 files changed

+189
-21
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ RUN bash -c 'if [ "$BUILD_CLIENTS_ONLY" != "1" ]; then \
9797
############################################################################
9898
FROM ${TENSORFLOW_IMAGE} AS trtserver_build
9999

100-
ARG TRTIS_VERSION=0.11.0dev
101-
ARG TRTIS_CONTAINER_VERSION=19.02dev
100+
ARG TRTIS_VERSION=0.11.0
101+
ARG TRTIS_CONTAINER_VERSION=19.02
102102
ARG PYVER=3.5
103103
ARG BUILD_CLIENTS_ONLY=0
104104

@@ -247,8 +247,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
247247
############################################################################
248248
FROM ${BASE_IMAGE}
249249

250-
ARG TRTIS_VERSION=0.11.0dev
251-
ARG TRTIS_CONTAINER_VERSION=19.02dev
250+
ARG TRTIS_VERSION=0.11.0
251+
ARG TRTIS_CONTAINER_VERSION=19.02
252252
ARG PYVER=3.5
253253

254254
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}

README.rst

+170-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,179 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.02 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or gRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 0.11.0 Beta
41+
-------------------------
42+
43+
* `Variable-size input and output tensor support
44+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#model-configuration>`_. Models
45+
that support variable-size input tensors and produce variable-size
46+
output tensors are now supported in the model configuration by using
47+
a dimension size of -1 for those dimensions that can take on any
48+
size.
49+
50+
* `String datatype support
51+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/client.html#string-datatype>`_.
52+
For TensorFlow models and custom backends, input and output tensors
53+
can contain strings.
54+
55+
* Improved support for non-GPU systems. The inference server will run
56+
correctly on systems that do not contain GPUs and that do not have
57+
nvidia-docker or CUDA installed.
58+
59+
Features
60+
--------
61+
62+
* `Multiple framework support
63+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
64+
server can manage any number and mix of models (limited by system
65+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
66+
TensorFlow SavedModel and Caffe2 NetDef model formats. Also supports
67+
TensorFlow-TensorRT integrated models.
68+
69+
* `Custom backend support
70+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
71+
allows individual models to be implemented with custom backends
72+
instead of by a deep-learning framework. With a custom backend a
73+
model can implement any logic desired, while still benefiting from
74+
the GPU support, concurrent execution, dynamic batching and other
75+
features provided by the server.
76+
77+
* The inference server `monitors the model repository
78+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
79+
for any change and dynamically reloads the model(s) when necessary,
80+
without requiring a server restart. Models and model versions can be
81+
added and removed, and model configurations can be modified while
82+
the server is running.
83+
84+
* Multi-GPU support. The server can distribute inferencing across all
85+
system GPUs.
86+
87+
* `Concurrent model execution support
88+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html?highlight=batching#instance-groups>`_. Multiple
89+
models (or multiple instances of the same model) can run
90+
simultaneously on the same GPU.
91+
92+
* Batching support. For models that support batching, the server can
93+
accept requests for a batch of inputs and respond with the
94+
corresponding batch of outputs. The inference server also supports
95+
`dynamic batching
96+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html?highlight=batching#dynamic-batching>`_
97+
where individual inference requests are dynamically combined
98+
together to improve inference throughput. Dynamic batching is
99+
transparent to the client requesting inference.
100+
101+
* `Model repositories
102+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
103+
may reside on a locally accessible file system (e.g. NFS) or in
104+
Google Cloud Storage.
105+
106+
* Readiness and liveness `health endpoints
107+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
108+
suitable for any orchestration or deployment framework, such as
109+
Kubernetes.
110+
111+
* `Metrics
112+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
113+
indicating GPU utiliization, server throughput, and server latency.
114+
39115
.. overview-end-marker-do-not-remove
40116
117+
The current release of the TensorRT Inference Server is 0.11.0 beta and
118+
corresponds to the 19.02 release of the tensorrtserver container on
119+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
120+
this release is `r19.02
121+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.02>`_.
122+
123+
Backwards Compatibility
124+
-----------------------
125+
126+
The inference server is still in beta. As a result, we sometimes make
127+
non-backwards-compatible changes. You must rebuild the client
128+
libraries and any client applications you use to talk to the inference
129+
server to make sure they stay in sync with the server. For the clients
130+
you must use the GitHub branch corresponding to the server.
131+
132+
Compared to the r19.01 release, the 19.02 release has the following
133+
non-backward-compatible changes:
134+
135+
* The inference request header for inputs and outputs no longer allow
136+
the byte_size field. See InferRequestHeader::Input and
137+
InferRequestHeader::Output in `api.proto
138+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_.
139+
140+
* The inference response header no longer returns the batch-1
141+
byte_size field for each output. Instead the shape and byte-size for
142+
the full output batch is returned. See InferResponseHeader::Output
143+
in `api.proto
144+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_.
145+
146+
* The inference response header reports the model version as a 64-bit
147+
integer (previously reported as an unsigned 32-bit integer). See
148+
InferResponseHeader.model_version in `api.proto
149+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_,
150+
InferRequest.model_version in `grpc_service.proto
151+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_server.proto>`_,
152+
and ModelStatus.version_status in `server_status.proto
153+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/server_status.proto>`_.
154+
155+
* For custom backends, the CustomGetOutputFn function signature has
156+
changed to require the backend to report the shape of each computed
157+
output. See CustomGetOutputFn_t in `custom.h
158+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/servables/custom/custom.h>`_.
159+
160+
Documentation
161+
-------------
162+
163+
The User Guide, Developer Guide, and API Reference `documentation
164+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
165+
provide guidance on installing, building and running the latest
166+
release of the TensorRT Inference Server.
167+
168+
You can also view the documentation for the `master branch
169+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
170+
and for `earlier releases
171+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
172+
173+
The `Release Notes
174+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
175+
and `Support Matrix
176+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
177+
indicate the required versions of the NVIDIA Driver and CUDA, and also
178+
describe which GPUs are supported by the inference server.
179+
180+
Contributing
181+
------------
182+
183+
Contributions to TensorRT Inference Server are more than welcome. To
184+
contribute make a pull request and follow the guidelines outlined in
185+
the `Contributing <CONTRIBUTING.md>`_ document.
186+
187+
Reporting problems, asking questions
188+
------------------------------------
189+
190+
We appreciate any feedback, questions or bug reporting regarding this
191+
project. When help with code is needed, follow the process outlined in
192+
the Stack Overflow (https://stackoverflow.com/help/mcve)
193+
document. Ensure posted examples are:
194+
195+
* minimal – use as little code as possible that still produces the
196+
same problem
197+
198+
* complete – provide all parts needed to reproduce the problem. Check
199+
if you can strip external dependency and still show the problem. The
200+
less time we spend on reproducing problems the more time we have to
201+
fix it
202+
203+
* verifiable – test the code you're about to provide to make sure it
204+
reproduces the problem. Remove all other problems that are not
205+
related to your request/question.
206+
41207
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42208
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.11.0dev
1+
0.11.0

docs/client.rst

+13-11
Original file line numberDiff line numberDiff line change
@@ -88,23 +88,25 @@ In the client image you can find the example executables in
8888

8989
If your host sytem is Ubuntu-16.04, an alternative to running the
9090
examples within the tensorrtserver_clients container is to instead
91-
copy the libraries and examples from the docker image to the host
92-
system::
91+
download the client libraries and examples from the `GitHub release
92+
page <https://github.com/NVIDIA/tensorrt-inference-server/releases>`_
93+
corresponding to the release you are interested in::
9394

94-
$ docker run -it --rm -v/tmp:/tmp/host tensorrtserver_clients
95-
# cp /opt/tensorrtserver/bin/image_client /tmp/host/.
96-
# cp /opt/tensorrtserver/bin/perf_client /tmp/host/.
97-
# cp /opt/tensorrtserver/bin/simple_client /tmp/host/.
98-
# cp /opt/tensorrtserver/pip/tensorrtserver-*.whl /tmp/host/.
99-
# cp /opt/tensorrtserver/lib/librequest.* /tmp/host/.
95+
$ mkdir tensorrtserver_clients
96+
$ cd tensorrtserver_clients
97+
$ wget https://github.com/NVIDIA/tensorrt-inference-server/archive/v0.11.0.clients.tar.gz
98+
$ tar xzf v0.11.0.clients.tar.gz
10099

101-
You can now access the files from /tmp on the host system. To run the
102-
C++ examples you must install some dependencies on your host system::
100+
You can now find client example binaries in bin/, c++ libraries in
101+
lib/, and Python client examples and wheel file in python/.
102+
103+
To run the C++ examples you must install some dependencies on your
104+
Ubuntu-16.04 host system::
103105

104106
$ apt-get install curl libcurl3-dev libopencv-dev libopencv-core-dev
105107

106108
To run the Python examples you will need to additionally install the
107-
client whl file and some other dependencies::
109+
wheel file and some other dependencies::
108110

109111
$ apt-get install python3 python3-pip
110112
$ pip3 install --user --upgrade tensorrtserver-*.whl numpy pillow

docs/model_configuration.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ For TensorFlow each value is in the tensorflow namespace. For example,
214214
tensorflow::DT_FLOAT is the 32-bit floating-point value.
215215

216216
For Caffe2 each value is in the caffe2 namespace and is prepended with
217-
TensorProto_DataType_. For example, caffe2::TensorProto_DataType_FLOAT
217+
TensorProto\_DataType\_. For example, caffe2::TensorProto_DataType_FLOAT
218218
is the 32-bit floating-point datatype.
219219

220220
For Numpy each value is in the numpy module. For example, numpy.float32

0 commit comments

Comments
 (0)