Skip to content

Model can't inference for Llama3.2-1B when use -d fp16 to convert pte #9534

Open
@WeiMa01

Description

@WeiMa01

when we runing Llama3.2-1B fp16.pte which use -d fp16 to convert Llama3.2-1B w/BF16 to pte, It meet an issue:
error log:
I 00:00:00.013206 executorch:main.cpp:69] Resetting threadpool with num threads = 6
I 00:00:00.027952 executorch:runner.cpp:67] Creating LLaMa runner: model_path=llama3_2_fp16_org.pte, tokenizer_path=../tokenizer.model
E 00:00:00.728030 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.728090 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.728099 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1
E 00:00:00.771031 executorch:XNNCompiler.cpp:635] Failed to create multiply node 266 with code: xnn_status_invalid_parameter
E 00:00:00.771096 executorch:XNNPACKBackend.cpp:106] XNNCompiler::compileModel failed: 0x1
E 00:00:00.771104 executorch:method.cpp:110] Init failed for backend XnnpackBackend: 0x1

convert command:
python -m examples.models.llama.export_llama --model "llama3_2" --checkpoint "/model_convert/Llama-3.2-1B/original/consolidated_00.pth" --params "/Llama-3.2-1B/original/params.json" --use_sdpa_with_kv_cache -X --xnnpack-extended-ops --output_name "llama3_2_fp16_direct_convert_runtime.pte" -kv -d fp16 --max_seq_length 256

cc @digantdesai @mcr229 @cbilgin

Metadata

Metadata

Assignees

Labels

module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions