Skip to content

Commit 60f7af6

Browse files
authored
Update README for 23.02 release (#5394)
* Update README for 23.02 release * Update branch version in URL
1 parent 79f0c34 commit 60f7af6

File tree

2 files changed

+96
-6
lines changed

2 files changed

+96
-6
lines changed

README.md

-6
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,6 @@
3030

3131
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
3232

33-
**LATEST RELEASE: You are currently on the main branch which tracks
34-
under-development progress towards the next release. The current release is
35-
version [2.31.0](https://github.com/triton-inference-server/server/tree/r23.02)
36-
and corresponds to the 23.02 container release on
37-
[NVIDIA GPU Cloud (NGC)](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver).**
38-
3933
----
4034
Triton Inference Server is an open source inference serving software that
4135
streamlines AI inferencing. Triton enables teams to deploy any AI model from

RELEASE.md

+96
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
<!--
2+
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Release Notes for 2.31.0
30+
31+
## New Freatures and Improvements
32+
33+
* Support for
34+
[ensemble models in Model Analyzer](https://github.com/triton-inference-server/model_analyzer/blob/r23.02/docs/config_search.md#ensemble-model-search).
35+
36+
* Support for GRPC Standard Health Check Protocol
37+
38+
* Fixed intermittent hangs during model loading for Python backend.
39+
40+
* Refer to the 23.02 column of the
41+
[Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
42+
for container image versions on which the 23.02 inference server container is
43+
based.
44+
45+
46+
## Known Issues
47+
48+
* In some rare cases Triton might overwrite input tensors while they are still
49+
in use which leads to corrupt input data being used for inference with
50+
TensorRT models. If you encounter accuracy issues with your TensorRT model,
51+
you can work-around the issue by
52+
[enabling the output_copy_stream option](https://github.com/triton-inference-server/common/blob/r23.02/protobuf/model_config.proto#L843-L852)
53+
in your model's configuration.
54+
55+
* Some systems which implement `malloc()` may not release memory back to the
56+
operating system right away causing a false memory leak. This can be mitigated
57+
by using a different malloc implementation. Tcmalloc is installed in the
58+
Triton container and can be
59+
[used by specifying the library in LD_PRELOAD](https://github.com/triton-inference-server/server/blob/r23.02/docs/user_guide/model_management.md#model-control-mode-explicit).
60+
61+
* When using a custom operator for the PyTorch backend, the operator may not be
62+
loaded due to undefined Python library symbols. This can be work-around by
63+
[specifying Python library in LD_PRELOAD](https://github.com/triton-inference-server/server/blob/r23.02/qa/L0_custom_ops/test.sh#L114-L117).
64+
65+
* Auto-complete may cause an increase in server start time. To avoid a start
66+
time increase, users can provide the full model configuration and launch the
67+
server with `--disable-auto-complete-config`.
68+
69+
* Auto-complete does not support PyTorch models due to lack of metadata in the
70+
model. It can only verify that the number of inputs and the input names
71+
matches what is specified in the model configuration. There is no model
72+
metadata about the number of outputs and datatypes. Related PyTorch bug:
73+
https://github.com/pytorch/pytorch/issues/38273
74+
75+
* Perf Analyzer stability criteria has been changed which may result in
76+
reporting instability for scenarios that were previously considered stable.
77+
This change has been made to improve the accuracy of Perf Analyzer results.
78+
If you observe this message, it can be resolved by increasing the
79+
`--measurement-interval` in the time windows mode or
80+
`--measurement-request-count` in the count windows mode.
81+
82+
* Triton Client PIP wheels for ARM SBSA are not available from PyPI and pip will
83+
install an incorrect Jetson version of Triton Client library for Arm SBSA.
84+
85+
The correct client wheel file can be pulled directly from the Arm SBSA SDK
86+
image and manually installed.
87+
88+
* Traced models in PyTorch seem to create overflows when int8 tensor values are
89+
transformed to int32 on the GPU.
90+
91+
Refer to https://github.com/pytorch/pytorch/issues/66930 for more information.
92+
93+
* Triton cannot retrieve GPU metrics with MIG-enabled GPU devices (A100 and A30).
94+
95+
* Triton metrics might not work if the host machine is running a separate DCGM
96+
agent on bare-metal or in a container.

0 commit comments

Comments
 (0)