Skip to content

Commit c81fb75

Browse files
committed
Update README and versions for 20.07-v1 release
1 parent 186dcd5 commit c81fb75

File tree

3 files changed

+218
-8
lines changed

3 files changed

+218
-8
lines changed

Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -150,8 +150,8 @@ FROM ${TENSORFLOW_IMAGE} AS trtserver_tf
150150
############################################################################
151151
FROM ${BASE_IMAGE} AS trtserver_build
152152

153-
ARG TRTIS_VERSION=1.15.0dev
154-
ARG TRTIS_CONTAINER_VERSION=20.07dev
153+
ARG TRTIS_VERSION=1.15.0
154+
ARG TRTIS_CONTAINER_VERSION=20.07
155155

156156
# libgoogle-glog0v5 is needed by caffe2 libraries.
157157
# libcurl4-openSSL-dev is needed for GCS
@@ -333,8 +333,8 @@ ENTRYPOINT ["/opt/tritonserver/nvidia_entrypoint.sh"]
333333
############################################################################
334334
FROM ${BASE_IMAGE}
335335

336-
ARG TRTIS_VERSION=1.15.0dev
337-
ARG TRTIS_CONTAINER_VERSION=20.07dev
336+
ARG TRTIS_VERSION=1.15.0
337+
ARG TRTIS_CONTAINER_VERSION=20.07
338338

339339
ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
340340
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}

README.rst

+213-3
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,223 @@
3030
NVIDIA Triton Inference Server
3131
==============================
3232

33-
**NOTE: You are currently on the r20.07-v1 branch tracks
34-
stabilization towards the next release. This branch is not usable
35-
during stabilization.**
33+
**Triton V1 and V2: For the 20.07 release, a legacy V1 version of
34+
Triton will be released from this branch. The V1 version of Triton
35+
is deprecated and no releases beyond 20.07 are planned. Going
36+
forward Triton V2 will continue monthly releases as described on
37+
branch** `master
38+
<https://github.com/NVIDIA/triton-inference-server>`_.
3639

3740
.. overview-begin-marker-do-not-remove
3841
42+
NVIDIA Triton Inference Server provides a cloud inferencing solution
43+
optimized for NVIDIA GPUs. The server provides an inference service
44+
via an HTTP or GRPC endpoint, allowing remote clients to request
45+
inferencing for any model being managed by the server. For edge
46+
deployments, Triton Server is also available as a shared library with
47+
an API that allows the full functionality of the server to be included
48+
directly in an application.
49+
50+
What's New in 1.15.0
51+
--------------------
52+
53+
* Support for the legacy V1 HTTP/REST, GRPC and corresponding client libraries
54+
is released on GitHub branch ``r20.07-v1`` and as NGC container
55+
``20.07-v1-py3``.
56+
57+
Features
58+
--------
59+
60+
* `Multiple framework support
61+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/model_repository.html#framework-model-definition>`_. The
62+
server can manage any number and mix of models (limited by system
63+
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
64+
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
65+
formats. Also supports TensorFlow-TensorRT and ONNX-TensorRT
66+
integrated models. Variable-size input and output tensors are
67+
allowed if supported by the framework. See `Capabilities
68+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/capabilities.html#capabilities>`_
69+
for detailed support information for each framework.
70+
71+
* `Concurrent model execution support
72+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/model_configuration.html#instance-groups>`_. Multiple
73+
models (or multiple instances of the same model) can run
74+
simultaneously on the same GPU.
75+
76+
* Batching support. For models that support batching, Triton Server
77+
can accept requests for a batch of inputs and respond with the
78+
corresponding batch of outputs. Triton Server also supports multiple
79+
`scheduling and batching
80+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/model_configuration.html#scheduling-and-batching>`_
81+
algorithms that combine individual inference requests together to
82+
improve inference throughput. These scheduling and batching
83+
decisions are transparent to the client requesting inference.
84+
85+
* `Custom backend support
86+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/model_repository.html#custom-backends>`_. Triton
87+
Server allows individual models to be implemented with custom
88+
backends instead of by a deep-learning framework. With a custom
89+
backend a model can implement any logic desired, while still
90+
benefiting from the GPU support, concurrent execution, dynamic
91+
batching and other features provided by the server.
92+
93+
* `Ensemble support
94+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
95+
ensemble represents a pipeline of one or more models and the
96+
connection of input and output tensors between those models. A
97+
single inference request to an ensemble will trigger the execution
98+
of the entire pipeline.
99+
100+
* Multi-GPU support. Triton Server can distribute inferencing across
101+
all system GPUs.
102+
103+
* Triton Server provides `multiple modes for model management
104+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/model_management.html>`_. These
105+
model management modes allow for both implicit and explicit loading
106+
and unloading of models without requiring a server restart.
107+
108+
* `Model repositories
109+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/model_repository.html#>`_
110+
may reside on a locally accessible file system (e.g. NFS), in Google
111+
Cloud Storage or in Amazon S3.
112+
113+
* Readiness and liveness `health endpoints
114+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/http_grpc_api.html#health>`_
115+
suitable for any orchestration or deployment framework, such as
116+
Kubernetes.
117+
118+
* `Metrics
119+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/metrics.html>`_
120+
indicating GPU utilization, server throughput, and server latency.
121+
122+
* `C library inferface
123+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/library_api.html>`_
124+
allows the full functionality of Triton Server to be included
125+
directly in an application.
126+
39127
.. overview-end-marker-do-not-remove
40128
129+
The current release of the Triton Inference Server is 1.15.0 and
130+
corresponds to the 20.07 V1 release of the tensorrtserver container on
131+
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
132+
this release is `r20.07-v1
133+
<https://github.com/NVIDIA/triton-inference-server/tree/r20.07-v1>`_.
134+
135+
Backwards Compatibility
136+
-----------------------
137+
138+
Continuing in the latest version the following interfaces maintain
139+
backwards compatibilty with the 1.0.0 release. If you have model
140+
configuration files, custom backends, or clients that use the
141+
inference server HTTP or GRPC APIs (either directly or through the
142+
client libraries) from releases prior to 1.0.0 you should edit
143+
and rebuild those as necessary to match the version 1.0.0 APIs.
144+
145+
The following inferfaces will maintain backwards compatibility for all
146+
future 1.x.y releases (see below for exceptions):
147+
148+
* Model configuration as defined in `model_config.proto
149+
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/core/model_config.proto>`_.
150+
151+
* The inference server HTTP and GRPC APIs as defined in `api.proto
152+
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/core/api.proto>`_
153+
and `grpc_service.proto
154+
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/core/grpc_service.proto>`_,
155+
except as noted below.
156+
157+
* The V1 and V2 custom backend interfaces as defined in `custom.h
158+
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/backends/custom/custom.h>`_.
159+
160+
As new features are introduced they may temporarily have beta status
161+
where they are subject to change in non-backwards-compatible
162+
ways. When they exit beta they will conform to the
163+
backwards-compatibility guarantees described above. Currently the
164+
following features are in beta:
165+
166+
* The inference server library API as defined in `trtserver.h
167+
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/core/trtserver.h>`_
168+
is currently in beta and may undergo non-backwards-compatible
169+
changes.
170+
171+
* The C++ and Python client libraries are not stictly included in the
172+
inference server compatibility guarantees and so should be
173+
considered as beta status.
174+
175+
Documentation
176+
-------------
177+
178+
The User Guide, Developer Guide, and API Reference `documentation for
179+
the current release
180+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/index.html>`_
181+
provide guidance on installing, building, and running Triton Inference
182+
Server.
183+
184+
You can also view the `documentation for the master branch
185+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/index.html>`_
186+
and for `earlier releases
187+
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.
188+
189+
An `FAQ
190+
<https://docs.nvidia.com/deeplearning/triton-inference-server/master-v1-user-guide/docs/faq.html>`_
191+
provides answers for frequently asked questions.
192+
193+
READMEs for deployment examples can be found in subdirectories of
194+
deploy/, for example, `deploy/single_server/README.rst
195+
<https://github.com/NVIDIA/triton-inference-server/tree/master/deploy/single_server/README.rst>`_.
196+
197+
The `Release Notes
198+
<https://docs.nvidia.com/deeplearning/sdk/inference-release-notes/index.html>`_
199+
and `Support Matrix
200+
<https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html>`_
201+
indicate the required versions of the NVIDIA Driver and CUDA, and also
202+
describe which GPUs are supported by Triton Server.
203+
204+
Presentations and Papers
205+
^^^^^^^^^^^^^^^^^^^^^^^^
206+
207+
* `High-Performance Inferencing at Scale Using the TensorRT Inference Server <https://developer.nvidia.com/gtc/2020/video/s22418>`_.
208+
209+
* `Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing <https://developer.nvidia.com/gtc/2020/video/s22459>`_.
210+
211+
* `Deep into Triton Inference Server: BERT Practical Deployment on NVIDIA GPU <https://developer.nvidia.com/gtc/2020/video/s21736>`_.
212+
213+
* `Maximizing Utilization for Data Center Inference with TensorRT
214+
Inference Server
215+
<https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9438-maximizing+utilization+for+data+center+inference+with+tensorrt+inference+server>`_.
216+
217+
* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference
218+
<https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/>`_.
219+
220+
* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT
221+
Inference Server and Kubeflow
222+
<https://www.kubeflow.org/blog/nvidia_tensorrt/>`_.
223+
224+
Contributing
225+
------------
226+
227+
Contributions to Triton Inference Server are more than welcome. To
228+
contribute make a pull request and follow the guidelines outlined in
229+
the `Contributing <CONTRIBUTING.md>`_ document.
230+
231+
Reporting problems, asking questions
232+
------------------------------------
233+
234+
We appreciate any feedback, questions or bug reporting regarding this
235+
project. When help with code is needed, follow the process outlined in
236+
the Stack Overflow (https://stackoverflow.com/help/mcve)
237+
document. Ensure posted examples are:
238+
239+
* minimal – use as little code as possible that still produces the
240+
same problem
241+
242+
* complete – provide all parts needed to reproduce the problem. Check
243+
if you can strip external dependency and still show the problem. The
244+
less time we spend on reproducing problems the more time we have to
245+
fix it
246+
247+
* verifiable – test the code you're about to provide to make sure it
248+
reproduces the problem. Remove all other problems that are not
249+
related to your request/question.
250+
41251
.. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg
42252
:target: https://opensource.org/licenses/BSD-3-Clause

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.15.0dev
1+
1.15.0

0 commit comments

Comments
 (0)