Skip to content

Commit 324486f

Browse files
author
David Goodwin
committed
Update for 1.0.0/19.03 release
1 parent c1e7ec4 commit 324486f

File tree

4 files changed

+192
-13
lines changed

4 files changed

+192
-13
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ RUN bash -c 'if [ "$BUILD_CLIENTS_ONLY" != "1" ]; then \
9797
############################################################################
9898
FROM ${TENSORFLOW_IMAGE} AS trtserver_build
9999

100-
ARG TRTIS_VERSION=1.0.0dev
101-
ARG TRTIS_CONTAINER_VERSION=19.03dev
100+
ARG TRTIS_VERSION=1.0.0
101+
ARG TRTIS_CONTAINER_VERSION=19.03
102102
ARG PYVER=3.5
103103
ARG BUILD_CLIENTS_ONLY=0
104104

@@ -253,8 +253,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
253253
############################################################################
254254
FROM ${BASE_IMAGE}
255255

256-
ARG TRTIS_VERSION=1.0.0dev
257-
ARG TRTIS_CONTAINER_VERSION=19.03dev
256+
ARG TRTIS_VERSION=1.0.0
257+
ARG TRTIS_CONTAINER_VERSION=19.03
258258
ARG PYVER=3.5
259259

260260
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}

README.rst

+177-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,186 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.03 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or gRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 1.0.0
41+
-------------------
42+
43+
* 1.0.0 is the first GA, non-beta, release of TensorRT Inference
44+
Server. See below for information on backwards-compatibility
45+
guarantees for this and future releases.
46+
47+
* Added support for `stateful
48+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#stateful-models>`_
49+
models and backends that require multiple inference requests be
50+
routed to the same model instance/batch slot. The new `sequence
51+
batcher
52+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#sequence-batcher>`_
53+
provides scheduling and batching capabilities for this class of
54+
models.
55+
56+
* Added `GRPC streaming protocol
57+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#stream-inference>`_
58+
support for inference requests.
59+
60+
* The HTTP front-end is now asynchronous to enable lower-latency and
61+
higher-throughput handling of inference requests.
62+
63+
* Enhanced perf_client to support stateful models and backends.
64+
65+
66+
Features
67+
--------
68+
69+
* `Multiple framework support
70+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
71+
server can manage any number and mix of models (limited by system
72+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
73+
TensorFlow SavedModel and Caffe2 NetDef model formats. Also supports
74+
TensorFlow-TensorRT integrated models. Variable-size input and
75+
output tensors are allowed if supported by the framework. See
76+
`Capabilities
77+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
78+
for detailed support information for each framework.
79+
80+
* `Concurrent model execution support
81+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
82+
models (or multiple instances of the same model) can run
83+
simultaneously on the same GPU.
84+
85+
* Batching support. For models that support batching, the server can
86+
accept requests for a batch of inputs and respond with the
87+
corresponding batch of outputs. The inference server also supports
88+
multiple `scheduling and batching
89+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
90+
algorithms that combine individual inference requests together to
91+
improve inference throughput. These scheduling and batching
92+
decisions are transparent to the client requesting inference.
93+
94+
* `Custom backend support
95+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
96+
allows individual models to be implemented with custom backends
97+
instead of by a deep-learning framework. With a custom backend a
98+
model can implement any logic desired, while still benefiting from
99+
the GPU support, concurrent execution, dynamic batching and other
100+
features provided by the server.
101+
102+
* Multi-GPU support. The server can distribute inferencing across all
103+
system GPUs.
104+
105+
* The inference server `monitors the model repository
106+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
107+
for any change and dynamically reloads the model(s) when necessary,
108+
without requiring a server restart. Models and model versions can be
109+
added and removed, and model configurations can be modified while
110+
the server is running.
111+
112+
* `Model repositories
113+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
114+
may reside on a locally accessible file system (e.g. NFS) or in
115+
Google Cloud Storage.
116+
117+
* Readiness and liveness `health endpoints
118+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
119+
suitable for any orchestration or deployment framework, such as
120+
Kubernetes.
121+
122+
* `Metrics
123+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
124+
indicating GPU utiliization, server throughput, and server latency.
125+
39126
.. overview-end-marker-do-not-remove
40127
128+
The current release of the TensorRT Inference Server is 1.0.0 and
129+
corresponds to the 19.03 release of the tensorrtserver container on
130+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
131+
this release is `r19.03
132+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.03>`_.
133+
134+
Backwards Compatibility
135+
-----------------------
136+
137+
This 19.03 includes the 1.0.0 release of the inference server.
138+
Starting with version 1.0.0 the following interfaces will maintain
139+
backwards compatibilty. If you have model configuration files, custom
140+
backends, or clients that use the inference server HTTP or GRPC APIs
141+
(either directly or through the client libraries) from releases prior
142+
to 19.03 you should edit and rebuild those as necessary to match the
143+
version 1.0.0 APIs.
144+
145+
These inferfaces will maintain backwards compatibility for all future
146+
1.x.y releases:
147+
148+
* Model configuration as defined in `model_config.proto
149+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
150+
151+
* The inference server HTTP and GRPC APIs as defined in `api.proto
152+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
153+
and `grpc_service.proto
154+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_.
155+
156+
* The custom backend interface as defined in `custom.h
157+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/servables/custom/custom.h>`_.
158+
159+
Documentation
160+
-------------
161+
162+
The User Guide, Developer Guide, and API Reference `documentation
163+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
164+
provide guidance on installing, building and running the latest
165+
release of the TensorRT Inference Server.
166+
167+
You can also view `earlier releases
168+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
169+
170+
The `Release Notes
171+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
172+
and `Support Matrix
173+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
174+
indicate the required versions of the NVIDIA Driver and CUDA, and also
175+
describe which GPUs are supported by the inference server.
176+
177+
Blog Posts
178+
^^^^^^^^^^
179+
180+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
181+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
182+
183+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
184+
Inference Server and Kubeflow
185+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
186+
187+
Contributing
188+
------------
189+
190+
Contributions to TensorRT Inference Server are more than welcome. To
191+
contribute make a pull request and follow the guidelines outlined in
192+
the `Contributing <CONTRIBUTING.md>`_ document.
193+
194+
Reporting problems, asking questions
195+
------------------------------------
196+
197+
We appreciate any feedback, questions or bug reporting regarding this
198+
project. When help with code is needed, follow the process outlined in
199+
the Stack Overflow (https://stackoverflow.com/help/mcve)
200+
document. Ensure posted examples are:
201+
202+
* minimal – use as little code as possible that still produces the
203+
same problem
204+
205+
* complete – provide all parts needed to reproduce the problem. Check
206+
if you can strip external dependency and still show the problem. The
207+
less time we spend on reproducing problems the more time we have to
208+
fix it
209+
210+
* verifiable – test the code you're about to provide to make sure it
211+
reproduces the problem. Remove all other problems that are not
212+
related to your request/question.
213+
41214
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42215
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.0.0dev
1+
1.0.0

docs/client.rst

+10-4
Original file line numberDiff line numberDiff line change
@@ -116,17 +116,23 @@ After untaring you can find the client example binaries in bin/,
116116
libraries in lib/, and Python client examples and wheel file in
117117
python/.
118118

119-
To run the Python and C++ examples you must install some dependencies::
119+
To use the C++ libraries and examples you must install some
120+
dependencies::
120121

121122
$ apt-get update
122-
$ apt-get install curl libcurl3-dev libopencv-dev libopencv-core-dev
123+
$ apt-get install curl libcurl3-dev
123124

124-
To run the Python examples you will need to additionally install the
125-
wheel file and some other dependencies::
125+
The Python examples require that you additionally install the wheel
126+
file and some other dependencies::
126127

127128
$ apt-get install python3 python3-pip
128129
$ pip3 install --user --upgrade tensorrtserver-*.whl numpy pillow
129130

131+
The C++ image_client example uses OpenCV for image manipulation so for
132+
that example you must install the following::
133+
134+
$ apt-get install libopencv-dev libopencv-core-dev
135+
130136
.. build-client-end-marker-do-not-remove
131137
132138
.. _section-image_classification_example:

0 commit comments

Comments
 (0)