Description
when we runing Llama3.2-1B fp16.pte which use -d fp16 to convert Llama3.2-1B w/BF16 to pte, It meet an issue:
error log:
I 00:00:00.013206 executorch:main.cpp:69] Resetting threadpool with num threads = 6
I 00:00:00.027952 executorch:runner.cpp:67] Creating LLaMa runner: model_path=llama3_2_fp16_org.pte, tokenizer_path=../tokenizer.model
E 00:00:00.728030 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.728090 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.728099 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1
E 00:00:00.771031 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.771096 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.771104 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1
convert command:
python -m examples.models.llama.export_llama --model "llama3_2" --checkpoint "/model_convert/Llama-3.2-1B/original/consolidated_00.pth" --params "/Llama-3.2-1B/original/params.json" --use_sdpa_with_kv_cache -X --xnnpack-extended-ops --output_name "llama3_2_fp16_direct_convert_runtime.pte" -kv -d fp16 --max_seq_length 256