Skip to content

Commit 7012ff7

Browse files
committed
Update README and versions for 20.01 release
1 parent 7e08f20 commit 7012ff7

File tree

3 files changed

+216
-9
lines changed

3 files changed

+216
-9
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -185,8 +185,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac
185185
############################################################################
186186
FROM ${BASE_IMAGE} AS trtserver_build
187187

188-
ARG TRTIS_VERSION=1.10.0dev
189-
ARG TRTIS_CONTAINER_VERSION=20.01dev
188+
ARG TRTIS_VERSION=1.10.0
189+
ARG TRTIS_CONTAINER_VERSION=20.01
190190

191191
# libgoogle-glog0v5 is needed by caffe2 libraries.
192192
# libcurl4-openSSL-dev is needed for GCS
@@ -351,8 +351,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"]
351351
############################################################################
352352
FROM ${BASE_IMAGE}
353353

354-
ARG TRTIS_VERSION=1.10.0dev
355-
ARG TRTIS_CONTAINER_VERSION=20.01dev
354+
ARG TRTIS_VERSION=1.10.0
355+
ARG TRTIS_CONTAINER_VERSION=20.01
356356

357357
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
358358
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+211-4
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,220 @@
3030
NVIDIA TensorRT Inference Server
3131
================================
3232

33-
**NOTE: You are currently on the r20.01 branch which tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
36-
3733
.. overview-begin-marker-do-not-remove
3834
35+
The NVIDIA TensorRT Inference Server provides a cloud inferencing
36+
solution optimized for NVIDIA GPUs. The server provides an inference
37+
service via an HTTP or gRPC endpoint, allowing remote clients to
38+
request inferencing for any model being managed by the server.
39+
40+
What's New In 1.10.0
41+
--------------------
42+
43+
* Server status can be requested in JSON format using the HTTP/REST API. Use
44+
endpoint `/api/status?format=json`.
45+
46+
* The dynamic batcher now has an option to preserve the ordering of batched
47+
requests when there are multiple model instances. See
48+
`model_config.proto <https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto#L583>`_
49+
for more information.
50+
51+
Features
52+
--------
53+
54+
* `Multiple framework support
55+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
56+
server can manage any number and mix of models (limited by system
57+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
58+
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
59+
formats. Also supports TensorFlow-TensorRT integrated
60+
models. Variable-size input and output tensors are allowed if
61+
supported by the framework. See `Capabilities
62+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/capabilities.html#capabilities>`_
63+
for detailed support information for each framework.
64+
65+
* `Concurrent model execution support
66+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
67+
models (or multiple instances of the same model) can run
68+
simultaneously on the same GPU.
69+
70+
* Batching support. For models that support batching, the server can
71+
accept requests for a batch of inputs and respond with the
72+
corresponding batch of outputs. The inference server also supports
73+
multiple `scheduling and batching
74+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
75+
algorithms that combine individual inference requests together to
76+
improve inference throughput. These scheduling and batching
77+
decisions are transparent to the client requesting inference.
78+
79+
* `Custom backend support
80+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#custom-backends>`_. The inference server
81+
allows individual models to be implemented with custom backends
82+
instead of by a deep-learning framework. With a custom backend a
83+
model can implement any logic desired, while still benefiting from
84+
the GPU support, concurrent execution, dynamic batching and other
85+
features provided by the server.
86+
87+
* `Ensemble support
88+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
89+
ensemble represents a pipeline of one or more models and the
90+
connection of input and output tensors between those models. A
91+
single inference request to an ensemble will trigger the execution
92+
of the entire pipeline.
93+
94+
* Multi-GPU support. The server can distribute inferencing across all
95+
system GPUs.
96+
97+
* The inference server provides `multiple modes for model management
98+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_management.html>`_. These
99+
model management modes allow for both implicit and explicit loading
100+
and unloading of models without requiring a server restart.
101+
102+
* `Model repositories
103+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#>`_
104+
may reside on a locally accessible file system (e.g. NFS), in Google
105+
Cloud Storage or in Amazon S3.
106+
107+
* Readiness and liveness `health endpoints
108+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/http_grpc_api.html#health>`_
109+
suitable for any orchestration or deployment framework, such as
110+
Kubernetes.
111+
112+
* `Metrics
113+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/metrics.html>`_
114+
indicating GPU utilization, server throughput, and server latency.
115+
116+
* `C library inferface
117+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/library_api.html>`_
118+
allows the full functionality of the inference server to be included
119+
directly in an application.
120+
39121
.. overview-end-marker-do-not-remove
40122
123+
The current release of the TensorRT Inference Server is 1.10.0 and
124+
corresponds to the 20.01 release of the tensorrtserver container on
125+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
126+
this release is `r20.01
127+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/r20.01>`_.
128+
129+
Backwards Compatibility
130+
-----------------------
131+
132+
Continuing in the latest version the following interfaces maintain
133+
backwards compatibilty with the 1.0.0 release. If you have model
134+
configuration files, custom backends, or clients that use the
135+
inference server HTTP or GRPC APIs (either directly or through the
136+
client libraries) from releases prior to 1.0.0 you should edit
137+
and rebuild those as necessary to match the version 1.0.0 APIs.
138+
139+
The following inferfaces will maintain backwards compatibility for all
140+
future 1.x.y releases (see below for exceptions):
141+
142+
* Model configuration as defined in `model_config.proto
143+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/model_config.proto>`_.
144+
145+
* The inference server HTTP and GRPC APIs as defined in `api.proto
146+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/api.proto>`_
147+
and `grpc_service.proto
148+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/grpc_service.proto>`_,
149+
except as noted below.
150+
151+
* The V1 custom backend interface as defined in `custom.h
152+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_.
153+
154+
As new features are introduced they may temporarily have beta status
155+
where they are subject to change in non-backwards-compatible
156+
ways. When they exit beta they will conform to the
157+
backwards-compatibility guarantees described above. Currently the
158+
following features are in beta:
159+
160+
* The inference server library API as defined in `trtserver.h
161+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/core/trtserver.h>`_
162+
is currently in beta and may undergo non-backwards-compatible
163+
changes.
164+
165+
* The inference server HTTP and GRPC APIs related to system and CUDA
166+
shared memory are currently in beta and may undergo
167+
non-backwards-compatible changes.
168+
169+
* The V2 custom backend interface as defined in `custom.h
170+
<https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/backends/custom/custom.h>`_
171+
is currently in beta and may undergo non-backwards-compatible
172+
changes.
173+
174+
* The C++ and Python client libraries are not stictly included in the
175+
inference server compatibility guarantees and so should be
176+
considered as beta status.
177+
178+
Documentation
179+
-------------
180+
181+
The User Guide, Developer Guide, and API Reference `documentation for
182+
the current release
183+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html>`_
184+
provide guidance on installing, building, and running the TensorRT
185+
Inference Server.
186+
187+
You can also view the `documentation for the master branch
188+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/index.html>`_
189+
and for `earlier releases
190+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
191+
192+
An `FAQ
193+
<https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-branch-guide/docs/faq.html>`_
194+
provides answers for frequently asked questions.
195+
196+
READMEs for deployment examples can be found in subdirectories of
197+
deploy/, for example, `deploy/single_server/README.rst
198+
<https://github.com/NVIDIA/tensorrt-inference-server/tree/master/deploy/single_server/README.rst>`_.
199+
200+
The `Release Notes
201+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
202+
and `Support Matrix
203+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
204+
indicate the required versions of the NVIDIA Driver and CUDA, and also
205+
describe which GPUs are supported by the inference server.
206+
207+
Other Documentation
208+
^^^^^^^^^^^^^^^^^^^
209+
210+
* `Maximizing Utilization for Data Center Inference with TensorRT
211+
Inference Server
212+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
213+
214+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
215+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
216+
217+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
218+
Inference Server and Kubeflow
219+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
220+
221+
Contributing
222+
------------
223+
224+
Contributions to TensorRT Inference Server are more than welcome. To
225+
contribute make a pull request and follow the guidelines outlined in
226+
the `Contributing <CONTRIBUTING.md>`_ document.
227+
228+
Reporting problems, asking questions
229+
------------------------------------
230+
231+
We appreciate any feedback, questions or bug reporting regarding this
232+
project. When help with code is needed, follow the process outlined in
233+
the Stack Overflow (https://stackoverflow.com/help/mcve)
234+
document. Ensure posted examples are:
235+
236+
* minimal – use as little code as possible that still produces the
237+
same problem
238+
239+
* complete – provide all parts needed to reproduce the problem. Check
240+
if you can strip external dependency and still show the problem. The
241+
less time we spend on reproducing problems the more time we have to
242+
fix it
243+
244+
* verifiable – test the code you're about to provide to make sure it
245+
reproduces the problem. Remove all other problems that are not
246+
related to your request/question.
247+
41248
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42249
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.10.0dev
1+
1.10.0

0 commit comments

Comments
 (0)