You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having made some improvements to the model's serving capabilities by enabling `dynamic batching` and the use of `multiple model instances`, the next step is to measure the impact of these features. To that end, the Triton Inference Server comes packaged with the [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) which is a tool specifically designed to measure performance for Triton Inference Servers. For ease of use, it is recommended that users run this inside the same container used to run client code in Part 1 of this series.
153
+
Having made some improvements to the model's serving capabilities by enabling `dynamic batching` and the use of `multiple model instances`, the next step is to measure the impact of these features. To that end, the Triton Inference Server comes packaged with the [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) which is a tool specifically designed to measure performance for Triton Inference Servers. For ease of use, it is recommended that users run this inside the same container used to run client code in Part 1 of this series.
154
154
```
155
155
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:yy.mm-py3-sdk bash
# Deploying ONNX, PyTorch and TensorFlow Models with the OpenVINO Backend
30
30
31
-
This README demonstrates how to deploy simple ONNX, PyTorch and TensorFlow models on Triton Inference Server using the [OpenVINO backend](https://github.com/triton-inference-server/openvino_backend).
31
+
This README demonstrates how to deploy simple ONNX, PyTorch and TensorFlow models on Triton Inference Server using the [OpenVINO backend](https://github.com/triton-inference-server/openvino_backend).
32
32
33
33
34
34
## Deploying an ONNX Model
35
35
### 1. Build the model repository and download the ONNX model.
### 3. Place the `config.pbtxt` file in the model repository, the structure should look as follows:
50
50
```
51
-
model_repository
52
-
|
53
-
+-- densenet_onnx
54
-
|
55
-
+-- config.pbtxt
56
-
+-- 1
57
-
|
51
+
model_repository
52
+
|
53
+
+-- densenet_onnx
54
+
|
55
+
+-- config.pbtxt
56
+
+-- 1
57
+
|
58
58
+-- model.onnx
59
59
```
60
-
60
+
61
61
Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server
74
74
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
75
75
```
76
-
docker run -it --rm --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
76
+
docker run -it --rm --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`.
89
89
90
90
## Deploying a PyTorch Model
91
91
### 1. Download and prepare the PyTorch model.
92
92
PyTorch models (.pt) will need to be converted to OpenVINO format. Create a `downloadAndConvert.py` file to download the PyTorch model and use the OpenVINO Model Converter to save a `model.xml` and `model.bin`:
93
93
```
94
-
import torchvision
95
-
import torch
96
-
import openvino as ov
97
-
model = torchvision.models.resnet50(weights='DEFAULT')
98
-
ov_model = ov.convert_model(model)
94
+
import torchvision
95
+
import torch
96
+
import openvino as ov
97
+
model = torchvision.models.resnet50(weights='DEFAULT')
98
+
ov_model = ov.convert_model(model)
99
99
ov.save_model(ov_model, 'model.xml')
100
100
```
101
101
102
102
Install the dependencies:
103
103
```
104
-
pip install openvino
104
+
pip install openvino
105
105
pip install torchvision
106
106
```
107
107
108
108
Run `downloadAndConvert.py`
109
109
```
110
-
python3 downloadAndConvert.py
110
+
python3 downloadAndConvert.py
111
111
```
112
112
113
113
To convert your own PyTorch model, refer to [Converting a PyTorch Model](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-pytorch.html)
114
114
115
115
### 2. Create a new file named `config.pbtxt`
116
116
```
117
-
name: "resnet50 "
118
-
backend: "openvino"
119
-
max_batch_size : 0
120
-
input [
121
-
{
122
-
name: "x"
123
-
data_type: TYPE_FP32
124
-
dims: [ 3, 224, 224 ]
125
-
reshape { shape: [ 1, 3, 224, 224 ] }
126
-
}
127
-
]
128
-
output [
129
-
{
130
-
name: "x.45"
131
-
data_type: TYPE_FP32
132
-
dims: [ 1, 1000 ,1, 1]
133
-
reshape { shape: [ 1, 1000 ] }
134
-
}
117
+
name: "resnet50 "
118
+
backend: "openvino"
119
+
max_batch_size : 0
120
+
input [
121
+
{
122
+
name: "x"
123
+
data_type: TYPE_FP32
124
+
dims: [ 3, 224, 224 ]
125
+
reshape { shape: [ 1, 3, 224, 224 ] }
126
+
}
127
+
]
128
+
output [
129
+
{
130
+
name: "x.45"
131
+
data_type: TYPE_FP32
132
+
dims: [ 1, 1000 ,1, 1]
133
+
reshape { shape: [ 1, 1000 ] }
134
+
}
135
135
]
136
136
```
137
137
138
138
3. Place the config.pbtxt file in the model repository as well as the model.xml and model.bin, the folder structure should look as follows:
139
139
```
140
-
model_repository
141
-
|
142
-
+-- resnet50
143
-
|
144
-
+-- config.pbtxt
145
-
+-- 1
146
-
|
147
-
+-- model.xml
148
-
+-- model.bin
140
+
model_repository
141
+
|
142
+
+-- resnet50
143
+
|
144
+
+-- config.pbtxt
145
+
+-- 1
146
+
|
147
+
+-- model.xml
148
+
+-- model.bin
149
149
```
150
150
151
151
Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.
In the `client.py` file, you’ll need to update the model input and output names to match those expected by the backend as the model is slightly different from the one in the Triton tutorial. For example, change the original input name used in the PyTorch model (input__0) to the name used by the OpenVINO backend (x).
@@ -170,12 +170,12 @@ In the `client.py` file, you’ll need to update the model input and output name
170
170
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server.
171
171
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
172
172
```
173
-
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
173
+
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
@@ -192,99 +192,99 @@ Export the TensorFlow model in SavedModel format:
192
192
docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/tensorflow:24.04-tf2-py3
193
193
```
194
194
```
195
-
python3 export.py
195
+
python3 export.py
196
196
```
197
197
198
198
The model will need to be converted to OpenVINO format. Create a `convert.py` file to use the OpenVINO Model Converter to save a `model.xml` and `model.bin`:
To convert your TensorFlow model, refer to [Converting a TensorFlow Model](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html)
216
216
217
217
### 2. Create a new file named `config.pbtxt`
218
218
```pbtxt
219
-
name: "resnet50"
220
-
backend: "openvino"
221
-
max_batch_size : 0
222
-
input [
223
-
{
224
-
name: "input_1"
225
-
data_type: TYPE_FP32
226
-
dims: [-1, 224, 224, 3 ]
227
-
}
228
-
]
229
-
output [
230
-
{
231
-
name: "predictions"
232
-
data_type: TYPE_FP32
233
-
dims: [-1, 1000]
234
-
}
219
+
name: "resnet50"
220
+
backend: "openvino"
221
+
max_batch_size : 0
222
+
input [
223
+
{
224
+
name: "input_1"
225
+
data_type: TYPE_FP32
226
+
dims: [-1, 224, 224, 3 ]
227
+
}
228
+
]
229
+
output [
230
+
{
231
+
name: "predictions"
232
+
data_type: TYPE_FP32
233
+
dims: [-1, 1000]
234
+
}
235
235
]
236
236
```
237
237
238
238
### 3. Place the `config.pbtxt` file in the model repository as well as the `model.xml` and `model.bin`, the structure should look as follows:
239
239
```
240
-
model_repository
241
-
|
242
-
+-- resnet50
243
-
|
244
-
+-- config.pbtxt
245
-
+-- 1
246
-
|
247
-
+-- model.xml
248
-
+-- model.bin
240
+
model_repository
241
+
|
242
+
+-- resnet50
243
+
|
244
+
+-- config.pbtxt
245
+
+-- 1
246
+
|
247
+
+-- model.xml
248
+
+-- model.bin
249
249
```
250
250
251
251
Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server.
264
264
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
265
265
```
266
-
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
266
+
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
0 commit comments