Skip to content

Commit a11452c

Browse files
author
David Goodwin
committed
Update README and versions for 19.06 release
1 parent db84540 commit a11452c

File tree

3 files changed

+199
-9
lines changed

3 files changed

+199
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac
123123
############################################################################
124124
FROM ${TENSORFLOW_IMAGE} AS trtserver_build
125125

126-
ARG TRTIS_VERSION=1.3.0dev
127-
ARG TRTIS_CONTAINER_VERSION=19.06dev
126+
ARG TRTIS_VERSION=1.3.0
127+
ARG TRTIS_CONTAINER_VERSION=19.06
128128
ARG PYVER=3.5
129129

130130
# The TFServing release branch must match the TF release used by
@@ -267,8 +267,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
267267
############################################################################
268268
FROM ${BASE_IMAGE}
269269

270-
ARG TRTIS_VERSION=1.3.0dev
271-
ARG TRTIS_CONTAINER_VERSION=19.06dev
270+
ARG TRTIS_VERSION=1.3.0
271+
ARG TRTIS_CONTAINER_VERSION=19.06
272272
ARG PYVER=3.5
273273

274274
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}

README.rst

+194-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,203 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r19.06 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or gRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 1.3.0
41+
-------------------
42+
43+
* The `ONNX Runtime <https://github.com/Microsoft/onnxruntime>`_ is
44+
now integrated into inference server. `ONNX models
45+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#onnx-models>`_
46+
can now be used directly in a model repository.
47+
48+
* HTTP health port may be specified independently of inference and
49+
status HTTP port with --http-health-port flag.
50+
51+
* Fixed bug in perf_client that caused high CPU usage that could lower
52+
the measured inference/sec in some cases.
53+
54+
Features
55+
--------
56+
57+
* `Multiple framework support
58+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
59+
server can manage any number and mix of models (limited by system
60+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
61+
TensorFlow SavedModel, ONNX and Caffe2 NetDef model formats. Also
62+
supports TensorFlow-TensorRT integrated models. Variable-size input
63+
and output tensors are allowed if supported by the framework. See
64+
`Capabilities
65+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
66+
for detailed support information for each framework.
67+
68+
* `Concurrent model execution support
69+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
70+
models (or multiple instances of the same model) can run
71+
simultaneously on the same GPU.
72+
73+
* Batching support. For models that support batching, the server can
74+
accept requests for a batch of inputs and respond with the
75+
corresponding batch of outputs. The inference server also supports
76+
multiple `scheduling and batching
77+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
78+
algorithms that combine individual inference requests together to
79+
improve inference throughput. These scheduling and batching
80+
decisions are transparent to the client requesting inference.
81+
82+
* `Custom backend support
83+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
84+
allows individual models to be implemented with custom backends
85+
instead of by a deep-learning framework. With a custom backend a
86+
model can implement any logic desired, while still benefiting from
87+
the GPU support, concurrent execution, dynamic batching and other
88+
features provided by the server.
89+
90+
* `Ensemble support
91+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
92+
ensemble represents a pipeline of one or more models and the
93+
connection of input and output tensors between those models. A
94+
single inference request to an ensemble will trigger the execution
95+
of the entire pipeline.
96+
97+
* Multi-GPU support. The server can distribute inferencing across all
98+
system GPUs.
99+
100+
* The inference server `monitors the model repository
101+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#modifying-the-model-repository>`_
102+
for any change and dynamically reloads the model(s) when necessary,
103+
without requiring a server restart. Models and model versions can be
104+
added and removed, and model configurations can be modified while
105+
the server is running.
106+
107+
* `Model repositories
108+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
109+
may reside on a locally accessible file system (e.g. NFS) or in
110+
Google Cloud Storage.
111+
112+
* Readiness and liveness `health endpoints
113+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
114+
suitable for any orchestration or deployment framework, such as
115+
Kubernetes.
116+
117+
* `Metrics
118+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
119+
indicating GPU utilization, server throughput, and server latency.
120+
39121
.. overview-end-marker-do-not-remove
40122
123+
The current release of the TensorRT Inference Server is 1.3.0 and
124+
corresponds to the 19.06 release of the tensorrtserver container on
125+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
126+
this release is `r19.06
127+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.06>`_.
128+
129+
Backwards Compatibility
130+
-----------------------
131+
132+
Continuing in version 1.3.0 the following interfaces maintain
133+
backwards compatibilty with the 1.0.0 release. If you have model
134+
configuration files, custom backends, or clients that use the
135+
inference server HTTP or GRPC APIs (either directly or through the
136+
client libraries) from releases prior to 1.0.0 (19.03) you should edit
137+
and rebuild those as necessary to match the version 1.0.0 APIs.
138+
139+
These inferfaces will maintain backwards compatibility for all future
140+
1.x.y releases (see below for exceptions):
141+
142+
* Model configuration as defined in `model_config.proto
143+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
144+
145+
* The inference server HTTP and GRPC APIs as defined in `api.proto
146+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
147+
and `grpc_service.proto
148+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_.
149+
150+
* The custom backend interface as defined in `custom.h
151+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
152+
153+
As new features are introduced they may temporarily have beta status
154+
where they are subject to change in non-backwards-compatible
155+
ways. When they exit beta they will conform to the
156+
backwards-compatibility guarantees described above. Currently the
157+
following features are in beta:
158+
159+
* In the model configuration defined in `model_config.proto
160+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_
161+
the sections related to model ensembling are currently in beta. In
162+
particular, the ModelEnsembling message will potentially undergo
163+
non-backwards-compatible changes.
164+
165+
166+
Documentation
167+
-------------
168+
169+
The User Guide, Developer Guide, and API Reference `documentation
170+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
171+
provide guidance on installing, building and running the latest
172+
release of the TensorRT Inference Server.
173+
174+
You can also view the documentation for the `master branch
175+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
176+
and for `earlier releases
177+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
178+
179+
READMEs for deployment examples can be found in subdirectories of
180+
deploy/, for example, `deploy/single_server/README.rst
181+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
182+
183+
The `Release Notes
184+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
185+
and `Support Matrix
186+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
187+
indicate the required versions of the NVIDIA Driver and CUDA, and also
188+
describe which GPUs are supported by the inference server.
189+
190+
Other Documentation
191+
^^^^^^^^^^^^^^^^^^^
192+
193+
* `Maximizing Utilization for Data Center Inference with TensorRT
194+
Inference Server
195+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
196+
197+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
198+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
199+
200+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
201+
Inference Server and Kubeflow
202+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
203+
204+
Contributing
205+
------------
206+
207+
Contributions to TensorRT Inference Server are more than welcome. To
208+
contribute make a pull request and follow the guidelines outlined in
209+
the `Contributing <CONTRIBUTING.md>`_ document.
210+
211+
Reporting problems, asking questions
212+
------------------------------------
213+
214+
We appreciate any feedback, questions or bug reporting regarding this
215+
project. When help with code is needed, follow the process outlined in
216+
the Stack Overflow (https://stackoverflow.com/help/mcve)
217+
document. Ensure posted examples are:
218+
219+
* minimal – use as little code as possible that still produces the
220+
same problem
221+
222+
* complete – provide all parts needed to reproduce the problem. Check
223+
if you can strip external dependency and still show the problem. The
224+
less time we spend on reproducing problems the more time we have to
225+
fix it
226+
227+
* verifiable – test the code you're about to provide to make sure it
228+
reproduces the problem. Remove all other problems that are not
229+
related to your request/question.
230+
41231
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42232
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.3.0dev
1+
1.3.0

0 commit comments

Comments
 (0)