Quantization converts the main operators (Convolution, Pooling, Binary, etc.) in the network from the original floating-point precision to the int8 precision, reducing the model size and improving performance. PS:
- For the KL quantization method, you can refer to: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
cd <path_to_tnn>/platforms/linux/
./build_quanttool.sh -c
Binary of the quantization tool: <path_to_tnn>/platforms/linux/build/quantization_cmd
./quantization_cmd [-h] [-p] [-m] [-i] [-b] [-w] [-n] [-s] [-c] <param>
option | mandatory | with value | description |
---|---|---|---|
-h, --help | Output command prompt. | ||
-p, --proto | √ | √ | Specify tnnproto model description file. |
-m, --model | √ | √ | Specify the tnnmodel model parameter file. |
-i, --input_path | √ | √ | Specify the path of the quantitative input folder. The currently supported formats are: • Text file (the file suffix is .txt) • Common picture format files (file suffix is .jpg .jpeg .png .bmp) All files under this directory will be used as input. |
-b, --blob_method | √ | Specify the feature map quantization method: • 0 Min-Max method (default) • 2 KL method |
|
-w, --weight_method | √ | Specify the quantification method of weights: • 0 Min-Max method (default) • 1 ADMM method |
|
-n, --mean | √ | ||
Pre-processing, mean operation on each channel of input data, parameter format: 0.0, 0.0, 0.0 | |||
-s, --scale | √ | Pre-processing, scale the input data channels, the parameter format is: 1.0, 1.0, 1.0 | |
-c, --merge_channel | √ | Whether to calculate all the channels together when quantifying the feature map, otherwise it is calculated separately for each channel. |
The input needs to include specific input data, otherwise it will affect the accuracy of the output result, and keep the number of pictures at about 20 ~ 50.
The input data is preprocessed mainly through mean and scale parameters. The formula is:
input_pre = (input - mean) * scale
Two files will be generated in the current directory where the command is executed:
- model_quantized.tnnproto -- Quantified model description file;
- model_quantized.tnnmodel -- Quantified model parameter file;
(1)-n and -s parameter only works when the input is a picture;
(2)When the input is a picture,it will be converted to RGB format for processing internally;
(3) When the input is txt, the input data storage method is NCHW, and of type float. The storage format stores one data in one line, in total of NCH*W lines. E.g,
0.01
1.1
0.1
255.0
...