Skip to content

Latest commit

 

History

History
58 lines (52 loc) · 3.21 KB

quantization_en.md

File metadata and controls

58 lines (52 loc) · 3.21 KB

Model Quantization

I. Why Quantization

Quantization converts the main operators (Convolution, Pooling, Binary, etc.) in the network from the original floating-point precision to the int8 precision, reducing the model size and improving performance. PS:

  1. For the KL quantization method, you can refer to: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

II. Compile

1. Build

cd <path_to_tnn>/platforms/linux/
./build_quanttool.sh -c

2. Output

Binary of the quantization tool: <path_to_tnn>/platforms/linux/build/quantization_cmd  

III. Usage

1. Command

./quantization_cmd [-h] [-p] [-m] [-i] [-b] [-w] [-n] [-s] [-c] <param>

2. Parameter Description

option mandatory with value description
-h, --help Output command prompt.
-p, --proto Specify tnnproto model description file.
-m, --model Specify the tnnmodel model parameter file.
-i, --input_path Specify the path of the quantitative input folder. The currently supported formats are:
• Text file (the file suffix is ​​.txt)
• Common picture format files (file suffix is ​​.jpg .jpeg .png .bmp)
All files under this directory will be used as input.
-b, --blob_method Specify the feature map quantization method:
• 0 Min-Max method (default)
• 2 KL method
-w, --weight_method Specify the quantification method of weights:
• 0 Min-Max method (default)
• 1 ADMM method
-n, --mean
Pre-processing, mean operation on each channel of input data, parameter format: 0.0, 0.0, 0.0
-s, --scale Pre-processing, scale the input data channels, the parameter format is: 1.0, 1.0, 1.0
-c, --merge_channel Whether to calculate all the channels together when quantifying the feature map, otherwise it is calculated separately for each channel.

3. Quantization Input

3.1 Select input data

The input needs to include specific input data, otherwise it will affect the accuracy of the output result, and keep the number of pictures at about 20 ~ 50.

3.2 Input preprocess

The input data is preprocessed mainly through mean and scale parameters. The formula is:
input_pre = (input - mean) * scale

4. Quantization Output

Two files will be generated in the current directory where the command is executed:

  • model_quantized.tnnproto -- Quantified model description file;
  • model_quantized.tnnmodel -- Quantified model parameter file;

5. Note

(1)-n and -s parameter only works when the input is a picture;
(2)When the input is a picture,it will be converted to RGB format for processing internally; (3) When the input is txt, the input data storage method is NCHW, and of type float. The storage format stores one data in one line, in total of NCH*W lines. E.g,

0.01
1.1
0.1
255.0
...