Skip to content

Commit b5fa169

Browse files
authored
add int8 inference for trt-cpp (#272)
* add int8 inference for trt * typo
1 parent 7a6f123 commit b5fa169

13 files changed

+457
-50
lines changed

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@ My implementation of [BiSeNetV1](https://arxiv.org/abs/1808.00897) and [BiSeNetV
44

55

66
mIOUs and fps on cityscapes val set:
7-
| none | ss | ssc | msf | mscf | fps(fp16/fp32) | link |
7+
| none | ss | ssc | msf | mscf | fps(fp32/fp16/int8) | link |
88
|------|:--:|:---:|:---:|:----:|:---:|:----:|
9-
| bisenetv1 | 75.44 | 76.94 | 77.45 | 78.86 | 78/25 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v1_city_new.pth) |
10-
| bisenetv2 | 74.95 | 75.58 | 76.53 | 77.08 | 67/26 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v2_city.pth) |
9+
| bisenetv1 | 75.44 | 76.94 | 77.45 | 78.86 | 25/78/141 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v1_city_new.pth) |
10+
| bisenetv2 | 74.95 | 75.58 | 76.53 | 77.08 | 26/67/95 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v2_city.pth) |
1111

1212
mIOUs on cocostuff val2017 set:
1313
| none | ss | ssc | msf | mscf | link |

tensorrt/CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ find_package(OpenCV REQUIRED)
1717

1818
cuda_add_library(kernels STATIC kernels.cu)
1919

20-
add_executable(segment segment.cpp trt_dep.cpp)
20+
add_executable(segment segment.cpp trt_dep.cpp read_img.cpp)
2121
target_include_directories(
2222
segment PUBLIC ${CUDA_INCLUDE_DIRS} ${CUDNN_INCLUDE_DIRS} ${OpenCV_INCLUDE_DIRS})
2323
target_link_libraries(

tensorrt/README.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,22 @@ This would generate a `./segment` in the `tensorrt/build` directory.
3838

3939

4040
#### 3. Convert onnx to tensorrt model
41-
If you can successfully compile the source code, you can parse the onnx model to tensorrt model like this:
41+
If you can successfully compile the source code, you can parse the onnx model to tensorrt model with one of the following commands.
42+
For fp32, command is:
4243
```
4344
$ ./segment compile /path/to/onnx.model /path/to/saved_model.trt
4445
```
4546
If your gpu support acceleration with fp16 inferenece, you can add a `--fp16` option to in this step:
4647
```
4748
$ ./segment compile /path/to/onnx.model /path/to/saved_model.trt --fp16
4849
```
50+
Building an int8 engine is also supported. Firstly, you should make sure your gpu support int8 inference, or you model will not be faster than fp16/fp32. Then you should prepare certain amount of images for int8 calibration. In this example, I use train set of cityscapes for calibration. The command is like this:
51+
```
52+
$ calibrate_int8 # delete this if exists
53+
$ ./segment compile /path/to/onnx.model /path/to/saved_model.trt --int8 /path/to/BiSeNet/datasets/cityscapes /path/to/BiSeNet/datasets/cityscapes/train.txt
54+
```
55+
With the above commands, we will have an tensorrt engine named `saved_model.trt` generated.
56+
4957
Note that I use the simplest method to parse the command line args, so please do **Not** change the order of the args in above command.
5058

5159

@@ -74,6 +82,8 @@ Likewise, you do not need to worry about this anymore with version newer than 7.
7482

7583
4. On my platform, after compiling with tensorrt, the model size of bisenetv1 is 29Mb(fp16) and 128Mb(fp32), and the size of bisenetv2 is 16Mb(fp16) and 42Mb(fp32). However, the fps of bisenetv1 is 68(fp16) and 23(fp32), while the fps of bisenetv2 is 59(fp16) and 21(fp32). It is obvious that bisenetv2 has fewer parameters than bisenetv1, but the speed is otherwise. I am not sure whether it is because tensorrt has worse optimization strategy in some ops used in bisenetv2(such as depthwise convolution) or because of the limitation of the gpu on different ops. Please tell me if you have better idea on this.
7684

85+
5. int8 mode is not always greatly faster than fp16 mode. For example, I tested with bisenetv1-cityscapes and tensorrt 8.2.5.1. With v100 gpu and driver 515.65, the fp16/int8 fps is 185.89/186.85, while with t4 gpu and driver 450.80, it is 78.77/142.31.
86+
7787

7888
### Using python
7989

tensorrt/batch_stream.hpp

+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
2+
#ifndef BATCH_STREAM_HPP
3+
#define BATCH_STREAM_HPP
4+
5+
6+
#include <string>
7+
#include <sstream>
8+
#include <fstream>
9+
#include <iostream>
10+
#include <vector>
11+
#include <algorithm>
12+
#include <numeric>
13+
#include <opencv2/opencv.hpp>
14+
15+
#include "NvInfer.h"
16+
#include "read_img.hpp"
17+
18+
using nvinfer1::Dims;
19+
using nvinfer1::Dims3;
20+
using nvinfer1::Dims4;
21+
22+
23+
class IBatchStream
24+
{
25+
public:
26+
virtual void reset(int firstBatch) = 0;
27+
virtual bool next() = 0;
28+
virtual void skip(int skipCount) = 0;
29+
virtual float* getBatch() = 0;
30+
virtual int getBatchesRead() const = 0;
31+
virtual int getBatchSize() const = 0;
32+
virtual nvinfer1::Dims4 getDims() const = 0;
33+
};
34+
35+
36+
class BatchStream : public IBatchStream
37+
{
38+
public:
39+
BatchStream(int batchSize, int maxBatches, Dims indim,
40+
const std::string& dataRoot,
41+
const std::string& dataFile)
42+
: mBatchSize{batchSize}
43+
, mMaxBatches{maxBatches}
44+
{
45+
mDims = Dims3(indim.d[1], indim.d[2], indim.d[3]);
46+
47+
readDataFile(dataFile, dataRoot);
48+
mSampleSize = std::accumulate(
49+
mDims.d, mDims.d + mDims.nbDims, 1, std::multiplies<int64_t>()) * sizeof(float);
50+
mData.resize(mSampleSize * mBatchSize);
51+
}
52+
53+
void reset(int firstBatch) override
54+
{
55+
cout << "mBatchCount: " << mBatchCount << endl;
56+
mBatchCount = firstBatch;
57+
}
58+
59+
bool next() override
60+
{
61+
if (mBatchCount >= mMaxBatches)
62+
{
63+
return false;
64+
}
65+
++mBatchCount;
66+
return true;
67+
}
68+
69+
void skip(int skipCount) override
70+
{
71+
mBatchCount += skipCount;
72+
}
73+
74+
float* getBatch() override
75+
{
76+
int offset = mBatchCount * mBatchSize;
77+
for (int i{0}; i < mBatchSize; ++i) {
78+
int ind = offset + i;
79+
read_data(mPaths[ind], &mData[i * mSampleSize], mDims.d[1], mDims.d[2]);
80+
}
81+
return mData.data();
82+
}
83+
84+
int getBatchesRead() const override
85+
{
86+
return mBatchCount;
87+
}
88+
89+
int getBatchSize() const override
90+
{
91+
return mBatchSize;
92+
}
93+
94+
nvinfer1::Dims4 getDims() const override
95+
{
96+
return Dims4{mBatchSize, mDims.d[0], mDims.d[1], mDims.d[2]};
97+
}
98+
99+
private:
100+
void readDataFile(const std::string& dataFilePath, const std::string& dataRootPath)
101+
{
102+
std::ifstream file(dataFilePath, std::ios::in);
103+
if (!file.is_open()) {
104+
cout << "file open failed: " << dataFilePath << endl;
105+
std::abort();
106+
}
107+
std::stringstream ss;
108+
file >> ss.rdbuf();
109+
file.close();
110+
111+
std::string impth;
112+
int n_imgs = 0;
113+
while (std::getline(ss, impth)) ++n_imgs;
114+
ss.clear(); ss.seekg(0, std::ios::beg);
115+
if (n_imgs <= 0) {
116+
cout << "ann file is empty, cannot read image paths for int8 calibration: "
117+
<< dataFilePath << endl;
118+
std::abort();
119+
}
120+
121+
mPaths.resize(n_imgs);
122+
for (int i{0}; i < n_imgs; ++i) {
123+
std::getline(ss, impth, ',');
124+
mPaths[i] = dataRootPath + "/" + impth;
125+
std::getline(ss, impth);
126+
}
127+
if (mMaxBatches < 0) {
128+
mMaxBatches = n_imgs / mBatchSize - 1;
129+
}
130+
if (mMaxBatches <= 0) {
131+
cout << "must have at least 1 batch for calibration\n";
132+
std::abort();
133+
}
134+
cout << "mMaxBatches = " << mMaxBatches << endl;
135+
}
136+
137+
138+
int mBatchSize{0};
139+
int mBatchCount{0};
140+
int mMaxBatches{0};
141+
Dims3 mDims{};
142+
std::vector<string> mPaths;
143+
std::vector<float> mData;
144+
int mSampleSize{0};
145+
};
146+
147+
148+
#endif

tensorrt/entropy_calibrator.hpp

+160
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
/*
2+
* Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
#ifndef ENTROPY_CALIBRATOR_HPP
18+
#define ENTROPY_CALIBRATOR_HPP
19+
20+
#include <algorithm>
21+
#include <numeric>
22+
#include <iterator>
23+
#include "NvInfer.h"
24+
25+
//! \class EntropyCalibratorImpl
26+
//!
27+
//! \brief Implements common functionality for Entropy calibrators.
28+
//!
29+
template <typename TBatchStream>
30+
class EntropyCalibratorImpl
31+
{
32+
public:
33+
EntropyCalibratorImpl(
34+
TBatchStream stream, int firstBatch, std::string cal_table_name, const char* inputBlobName, bool readCache = true)
35+
: mStream{stream}
36+
, mCalibrationTableName(cal_table_name)
37+
, mInputBlobName(inputBlobName)
38+
, mReadCache(readCache)
39+
{
40+
nvinfer1::Dims4 dims = mStream.getDims();
41+
mInputCount = std::accumulate(
42+
dims.d, dims.d + dims.nbDims, 1, std::multiplies<int64_t>());
43+
cout << "dims.nbDims: " << dims.nbDims << endl;
44+
for (int i{0}; i < dims.nbDims; ++i) {
45+
cout << dims.d[i] << ", ";
46+
}
47+
cout << endl;
48+
49+
cudaError_t state;
50+
state = cudaMalloc(&mDeviceInput, mInputCount * sizeof(float));
51+
if (state) {
52+
cout << "allocate memory failed\n";
53+
std::abort();
54+
}
55+
cout << "mInputCount: " << mInputCount << endl;
56+
mStream.reset(firstBatch);
57+
}
58+
59+
virtual ~EntropyCalibratorImpl()
60+
{
61+
cudaError_t state;
62+
state = cudaFree(mDeviceInput);
63+
if (state) {
64+
cout << "free memory failed\n";
65+
std::abort();
66+
}
67+
}
68+
69+
int getBatchSize() const
70+
{
71+
return mStream.getBatchSize();
72+
}
73+
74+
bool getBatch(void* bindings[], const char* names[], int nbBindings)
75+
{
76+
if (!mStream.next())
77+
{
78+
return false;
79+
}
80+
cudaError_t state;
81+
state = cudaMemcpy(mDeviceInput, mStream.getBatch(), mInputCount * sizeof(float), cudaMemcpyHostToDevice);
82+
if (state) {
83+
cout << "memory copy to device failed\n";
84+
std::abort();
85+
}
86+
assert(!strcmp(names[0], mInputBlobName));
87+
bindings[0] = mDeviceInput;
88+
return true;
89+
}
90+
91+
const void* readCalibrationCache(size_t& length)
92+
{
93+
mCalibrationCache.clear();
94+
std::ifstream input(mCalibrationTableName, std::ios::binary);
95+
input >> std::noskipws;
96+
if (mReadCache && input.good())
97+
{
98+
std::copy(std::istream_iterator<char>(input), std::istream_iterator<char>(),
99+
std::back_inserter(mCalibrationCache));
100+
}
101+
length = mCalibrationCache.size();
102+
return length ? mCalibrationCache.data() : nullptr;
103+
}
104+
105+
void writeCalibrationCache(const void* cache, size_t length)
106+
{
107+
std::ofstream output(mCalibrationTableName, std::ios::binary);
108+
output.write(reinterpret_cast<const char*>(cache), length);
109+
}
110+
111+
private:
112+
TBatchStream mStream;
113+
size_t mInputCount;
114+
std::string mCalibrationTableName;
115+
const char* mInputBlobName;
116+
bool mReadCache{true};
117+
void* mDeviceInput{nullptr};
118+
std::vector<char> mCalibrationCache;
119+
};
120+
121+
//! \class Int8EntropyCalibrator2
122+
//!
123+
//! \brief Implements Entropy calibrator 2.
124+
//! CalibrationAlgoType is kENTROPY_CALIBRATION_2.
125+
//!
126+
template <typename TBatchStream>
127+
class Int8EntropyCalibrator2 : public nvinfer1::IInt8EntropyCalibrator2
128+
{
129+
public:
130+
Int8EntropyCalibrator2(
131+
TBatchStream stream, int firstBatch, const char* networkName, const char* inputBlobName, bool readCache = true)
132+
: mImpl(stream, firstBatch, networkName, inputBlobName, readCache)
133+
{
134+
}
135+
136+
int getBatchSize() const noexcept override
137+
{
138+
return mImpl.getBatchSize();
139+
}
140+
141+
bool getBatch(void* bindings[], const char* names[], int nbBindings) noexcept override
142+
{
143+
return mImpl.getBatch(bindings, names, nbBindings);
144+
}
145+
146+
const void* readCalibrationCache(size_t& length) noexcept override
147+
{
148+
return mImpl.readCalibrationCache(length);
149+
}
150+
151+
void writeCalibrationCache(const void* cache, size_t length) noexcept override
152+
{
153+
mImpl.writeCalibrationCache(cache, length);
154+
}
155+
156+
private:
157+
EntropyCalibratorImpl<TBatchStream> mImpl;
158+
};
159+
160+
#endif // ENTROPY_CALIBRATOR_H

0 commit comments

Comments
 (0)