The aim of this laboratory is to check the work of Apache TVM model fine-tuning feature.
-
Go through the Auto-tuning a Convolutional Network for x86 CPU.
-
Go fine_tuning_experiments script and check what it does.
-
The class
TVMFineTunedModelinherits following methods from TVMModel from l04_tvm assignment:preprocess_input,postprocess_outputs,prepare_model,run_inference.
-
Use the implemented
TVMModelfrom L04 assignment or implement above-mentioned methods. -
[5pt]Implementtune_kernelsmethod:- Use
get_tunermethod for each task from tasks, - Use
len(task.config_space)asn_trialfortuner.tune - Use
measure_optionin tuning method, - Use
autotvm.callback.progress_barandautotvm.callback.log_to_filecallbacks (useself.optlogpathas log path). - Add early stopping after 20% of
n_trialtrials with no improvement.
- Use
-
[6pt]Implementtune_graphmethod:- Focus on
nn.conv2doperator only (userelay.op.get). - Use
PBQPTuneras tuning executor. - Use
mod,self.input_name,self.input_shape,self.optlogpath,self.targetvariables fromtune_kernelsandoptimize_modelto set up the executor. - Use
benchmark_layout_transformfor setting up benchmarks (usemin_exec_num5). - Run the executor.
- Save tuning results to
self.graphoptlogpathfile.
- Focus on
-
[4pt]Finish the implementation of model fine-tuning:- Extract tasks from the initially optimized module using
autotvm.task.extract_from_program(focus onnn.conv2doperator only). - Tune kernels using
tune_kernelsmethod. - Tune the whole graph using
tune_graphmethod. - Compile and save the model library - use
autotvm.apply_graph_bestmethod to load log fromself.graphoptlogpath.
- Extract tasks from the initially optimized module using
-
Run benchmarks using:
python3 -m dl_in_iot_course.l06_tvm_fine_tuning.fine_tuning_experiments \ --fp32-model-path models/pet-dataset-tensorflow.fp32.tflite \ --dataset-root build/pet-dataset/ \ --results-path build/fine-tuning-resultsNOTE:The fine-tuning takes quite a long time. -
[2pt]In directory for assignment's summary, include:- Kernel log file (
pet-dataset-tensorflow.fp32.tvm-tune.kernellog), - Graph log file (
pet-dataset-tensorflow.fp32.tvm-tune.graphlog).
- Kernel log file (
-
Write a very brief summary:
[1pt]Compare the inference time between the fine-tuned model and FP32 model with NCHW layout with opt level 3 andllvm -mcpu=core-avx2target.
At least a very slight improvement should be observed.
There should be no need for additional imports. The blocks of code to implement (3 blocks) should take at most around 20 lines.
Additional factors:
[2pt]Git history quality