-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Description
I'm running a bert base onnx model with the tensorrt docker container. My model has dynamic batchsize.
docker pull nvcr.io/nvidia/tensorrt:21.09-py3
docker run --gpus 1 -it --rm -v $PWD:/workspace nvcr.io/nvidia/tensorrt:21.09-py3
Run trtexec
trtexec --onnx=./data/bert/bert-base-128.onnx --useCudaGraph --iterations=1000 --workspace=10000 --fp16 --optShapes=input_mask:1x128,segment_ids:1x128,input_ids:1x128 --verbose
trtexec --onnx=./data/bert/bert-base-128.onnx --useCudaGraph --iterations=1000 --workspace=10000 --fp16 --optShapes=input_mask:1x128,segment_ids:1x128,input_ids:1x128 --verbose
Here is my table for throughput and latency
| bert-base | ||
|---|---|---|
| Input shapes | Throughput (qps) | Avg E2E Latency (ms) |
| 1x128 | 1089.81 | 1.75901 |
| 2x128 | 921.662 | 2.09433 |
| 4x128 | 734.285 | 2.64538 |
| 8x128 | 480.935 | 4.04696 |
| 32x128 | 149.114 | 13.1884 |
Question: How should I interpret the results? Why throughput decreases with increasing batchsize?
Metadata
Metadata
Assignees
Labels
No labels