Skip to content

Commit eacf310

Browse files
authored
PA Migration: Doc Updates (#105)
* PA Migration: Update docs
1 parent ea992c4 commit eacf310

File tree

2 files changed

+104
-104
lines changed
  • Conceptual_Guide/Part_2-improving_resource_utilization
  • Quick_Deploy/OpenVINO

2 files changed

+104
-104
lines changed

Conceptual_Guide/Part_2-improving_resource_utilization/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ tritonserver --model-repository=/models
150150

151151
### Measuring Performance
152152

153-
Having made some improvements to the model's serving capabilities by enabling `dynamic batching` and the use of `multiple model instances`, the next step is to measure the impact of these features. To that end, the Triton Inference Server comes packaged with the [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) which is a tool specifically designed to measure performance for Triton Inference Servers. For ease of use, it is recommended that users run this inside the same container used to run client code in Part 1 of this series.
153+
Having made some improvements to the model's serving capabilities by enabling `dynamic batching` and the use of `multiple model instances`, the next step is to measure the impact of these features. To that end, the Triton Inference Server comes packaged with the [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) which is a tool specifically designed to measure performance for Triton Inference Servers. For ease of use, it is recommended that users run this inside the same container used to run client code in Part 1 of this series.
154154
```
155155
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:yy.mm-py3-sdk bash
156156
```

Quick_Deploy/OpenVINO/README.md

Lines changed: 103 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -28,136 +28,136 @@
2828

2929
# Deploying ONNX, PyTorch and TensorFlow Models with the OpenVINO Backend
3030

31-
This README demonstrates how to deploy simple ONNX, PyTorch and TensorFlow models on Triton Inference Server using the [OpenVINO backend](https://github.com/triton-inference-server/openvino_backend).
31+
This README demonstrates how to deploy simple ONNX, PyTorch and TensorFlow models on Triton Inference Server using the [OpenVINO backend](https://github.com/triton-inference-server/openvino_backend).
3232

3333

3434
## Deploying an ONNX Model
3535
### 1. Build the model repository and download the ONNX model.
3636
```
37-
mkdir -p model_repository/densenet_onnx/1
37+
mkdir -p model_repository/densenet_onnx/1
3838
wget -O model_repository/densenet_onnx/1/model.onnx \
3939
https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx
4040
```
4141

4242
### 2. Create a new file named `config.pbtxt`
4343
```
44-
name: "densenet_onnx"
45-
backend: "openvino"
46-
default_model_filename: "model.onnx"
44+
name: "densenet_onnx"
45+
backend: "openvino"
46+
default_model_filename: "model.onnx"
4747
```
4848

4949
### 3. Place the `config.pbtxt` file in the model repository, the structure should look as follows:
5050
```
51-
model_repository
52-
|
53-
+-- densenet_onnx
54-
|
55-
+-- config.pbtxt
56-
+-- 1
57-
|
51+
model_repository
52+
|
53+
+-- densenet_onnx
54+
|
55+
+-- config.pbtxt
56+
+-- 1
57+
|
5858
+-- model.onnx
5959
```
60-
60+
6161
Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.
6262

6363
### 4. Run the Triton Inference Server
6464
```
65-
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
65+
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
6666
```
6767

6868
### 5. Download the Triton Client code `client.py` from GitHub to a place you want to run the Triton Client from.
6969
```
70-
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/ONNX/client.py
70+
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/ONNX/client.py
7171
```
7272

7373
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server
7474
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
7575
```
76-
docker run -it --rm --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
76+
docker run -it --rm --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
7777
```
7878
```
79-
pip install torchvision
80-
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
81-
python3 client.py
79+
pip install torchvision
80+
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
81+
python3 client.py
8282
```
8383

8484
### 7. Output
8585
```
86-
['11.549026:92' '11.232335:14' '7.528014:95' '6.923391:17' '6.576575:88']
86+
['11.549026:92' '11.232335:14' '7.528014:95' '6.923391:17' '6.576575:88']
8787
```
8888
The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`.
8989

9090
## Deploying a PyTorch Model
9191
### 1. Download and prepare the PyTorch model.
9292
PyTorch models (.pt) will need to be converted to OpenVINO format. Create a `downloadAndConvert.py` file to download the PyTorch model and use the OpenVINO Model Converter to save a `model.xml` and `model.bin`:
9393
```
94-
import torchvision
95-
import torch
96-
import openvino as ov
97-
model = torchvision.models.resnet50(weights='DEFAULT')
98-
ov_model = ov.convert_model(model)
94+
import torchvision
95+
import torch
96+
import openvino as ov
97+
model = torchvision.models.resnet50(weights='DEFAULT')
98+
ov_model = ov.convert_model(model)
9999
ov.save_model(ov_model, 'model.xml')
100100
```
101101

102102
Install the dependencies:
103103
```
104-
pip install openvino
104+
pip install openvino
105105
pip install torchvision
106106
```
107107

108108
Run `downloadAndConvert.py`
109109
```
110-
python3 downloadAndConvert.py
110+
python3 downloadAndConvert.py
111111
```
112112

113113
To convert your own PyTorch model, refer to [Converting a PyTorch Model](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-pytorch.html)
114114

115115
### 2. Create a new file named `config.pbtxt`
116116
```
117-
name: "resnet50 "
118-
backend: "openvino"
119-
max_batch_size : 0
120-
input [
121-
{
122-
name: "x"
123-
data_type: TYPE_FP32
124-
dims: [ 3, 224, 224 ]
125-
reshape { shape: [ 1, 3, 224, 224 ] }
126-
}
127-
]
128-
output [
129-
{
130-
name: "x.45"
131-
data_type: TYPE_FP32
132-
dims: [ 1, 1000 ,1, 1]
133-
reshape { shape: [ 1, 1000 ] }
134-
}
117+
name: "resnet50 "
118+
backend: "openvino"
119+
max_batch_size : 0
120+
input [
121+
{
122+
name: "x"
123+
data_type: TYPE_FP32
124+
dims: [ 3, 224, 224 ]
125+
reshape { shape: [ 1, 3, 224, 224 ] }
126+
}
127+
]
128+
output [
129+
{
130+
name: "x.45"
131+
data_type: TYPE_FP32
132+
dims: [ 1, 1000 ,1, 1]
133+
reshape { shape: [ 1, 1000 ] }
134+
}
135135
]
136136
```
137137

138138
3. Place the config.pbtxt file in the model repository as well as the model.xml and model.bin, the folder structure should look as follows:
139139
```
140-
model_repository
141-
|
142-
+-- resnet50
143-
|
144-
+-- config.pbtxt
145-
+-- 1
146-
|
147-
+-- model.xml
148-
+-- model.bin
140+
model_repository
141+
|
142+
+-- resnet50
143+
|
144+
+-- config.pbtxt
145+
+-- 1
146+
|
147+
+-- model.xml
148+
+-- model.bin
149149
```
150150

151151
Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.
152152

153153
### 4. Run the Triton Inference Server
154154
```
155-
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
155+
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
156156
```
157157

158158
### 5. In another terminal, download the Triton Client code `client.py` from GitHub to the place you want to run the Triton Client from.
159159
```
160-
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/PyTorch/client.py
160+
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/PyTorch/client.py
161161
```
162162

163163
In the `client.py` file, you’ll need to update the model input and output names to match those expected by the backend as the model is slightly different from the one in the Triton tutorial. For example, change the original input name used in the PyTorch model (input__0) to the name used by the OpenVINO backend (x).
@@ -170,12 +170,12 @@ In the `client.py` file, you’ll need to update the model input and output name
170170
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server.
171171
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
172172
```
173-
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
173+
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
174174
```
175175
```
176-
pip install torchvision
177-
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
178-
python3 client.py
176+
pip install torchvision
177+
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
178+
python3 client.py
179179
```
180180

181181
### 7. Output
@@ -192,99 +192,99 @@ Export the TensorFlow model in SavedModel format:
192192
docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/tensorflow:24.04-tf2-py3
193193
```
194194
```
195-
python3 export.py
195+
python3 export.py
196196
```
197197

198198
The model will need to be converted to OpenVINO format. Create a `convert.py` file to use the OpenVINO Model Converter to save a `model.xml` and `model.bin`:
199199
```
200-
import openvino as ov
201-
ov_model = ov.convert_model(' path_to_saved_model_dir’)
200+
import openvino as ov
201+
ov_model = ov.convert_model(' path_to_saved_model_dir’)
202202
ov.save_model(ov_model, 'model.xml')
203203
```
204204

205205
Install the dependencies:
206206
```
207-
pip install openvino
207+
pip install openvino
208208
```
209209

210210
Run `convert.py`
211211
```
212-
python3 convert.py
212+
python3 convert.py
213213
```
214214

215215
To convert your TensorFlow model, refer to [Converting a TensorFlow Model](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html)
216216

217217
### 2. Create a new file named `config.pbtxt`
218218
```pbtxt
219-
name: "resnet50"
220-
backend: "openvino"
221-
max_batch_size : 0
222-
input [
223-
{
224-
name: "input_1"
225-
data_type: TYPE_FP32
226-
dims: [-1, 224, 224, 3 ]
227-
}
228-
]
229-
output [
230-
{
231-
name: "predictions"
232-
data_type: TYPE_FP32
233-
dims: [-1, 1000]
234-
}
219+
name: "resnet50"
220+
backend: "openvino"
221+
max_batch_size : 0
222+
input [
223+
{
224+
name: "input_1"
225+
data_type: TYPE_FP32
226+
dims: [-1, 224, 224, 3 ]
227+
}
228+
]
229+
output [
230+
{
231+
name: "predictions"
232+
data_type: TYPE_FP32
233+
dims: [-1, 1000]
234+
}
235235
]
236236
```
237237

238238
### 3. Place the `config.pbtxt` file in the model repository as well as the `model.xml` and `model.bin`, the structure should look as follows:
239239
```
240-
model_repository
241-
|
242-
+-- resnet50
243-
|
244-
+-- config.pbtxt
245-
+-- 1
246-
|
247-
+-- model.xml
248-
+-- model.bin
240+
model_repository
241+
|
242+
+-- resnet50
243+
|
244+
+-- config.pbtxt
245+
+-- 1
246+
|
247+
+-- model.xml
248+
+-- model.bin
249249
```
250250

251251
Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.
252252

253253
### 4. Run the Triton Inference Server
254254
```
255-
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
255+
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
256256
```
257257

258258
### 5. In another terminal, download the Triton Client code `client.py` from GitHub to the place you want to run the Triton Client from.
259259
```
260-
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/TensorFlow/client.py
260+
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/TensorFlow/client.py
261261
```
262262

263263
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server.
264264
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
265265
```
266-
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
266+
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
267267
```
268268
```
269-
pip install --upgrade tensorflow
270-
pip install image
271-
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
272-
python3 client.py
269+
pip install --upgrade tensorflow
270+
pip install image
271+
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
272+
python3 client.py
273273
```
274274

275275
### 7. Output
276276
```
277-
[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
277+
[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
278278
279-
b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
279+
b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
280280
281-
b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
281+
b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
282282
283-
b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
283+
b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
284284
285-
...
285+
...
286286
287-
b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
287+
b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
288288
289289
b'0.000001:627' b'0.000001:933' b'0.000000:661' b'0.000000:148']
290290
```

0 commit comments

Comments
 (0)