Describe the bug
when a user passes e.g. --checkpoint DEFAULT_W4A16, the precision override
happens after argument parsing (in export_main, via determine_precision_from_checkpoint).
However, validate_precision_runtime() is called during parse_args(), at which point
parsed_args.precision still holds the model's DEFAULT_PRECISION (e.g. w4), not w4a16.
This causes a false "Model does not support runtime onnxruntime_genai with precision w4"
error even though w4a16 + onnxruntime_genai is a valid combination.
To Reproduce
Steps to reproduce the behavior:
- Run export Llama 3.2 1B Instruct model for onnxruntime-genai target
python -m qai_hub_models.models.llama_v3_2_1b_instruct.export
--device "SA7255P ADP"
--skip-inferencing
--skip-profiling
--output-dir ./
--context-length 1024
--checkpoint DEFAULT_W4A16
--target-runtime onnxruntime_genai
2. It has error as invalid precision
Model does not support runtime onnxruntime_genai with precision w4. These combinations are supported:
w4: genie
w4a16: genie, onnxruntime_genai
Expected behavior
Export successful with input checkpoint W4A16
Stack trace
python -m qai_hub_models.models.llama_v3_2_1b_instruct.export
--device "SA7255P ADP"
--skip-inferencing
--skip-profiling
--output-dir ./
--context-length 1024
--checkpoint DEFAULT_W4A16
--target-runtime onnxruntime_genai
/mnt/disk1/qaihub/venv/lib/python3.10/site-packages/qai_hub_models/models/_shared/llm/model.py:91: FutureWarning: aimet_common package is deprecated since v2.20 and will be deleted in the future releases. Please directly import 👉 aimet_onnx.common 👈 instead.
import aimet_common.quantsim as qs
Unable to import cvxpy
Model does not support runtime onnxruntime_genai with precision w4. These combinations are supported:
w4: genie
w4a16: genie, onnxruntime_genai
Host configuration:
- OS and version: Ubuntu 20.04
- Browser : Chrome
- QAI-Hub-Models version: 0.50
- QAI-Hub client version: 0.47
Describe the bug
when a user passes e.g. --checkpoint DEFAULT_W4A16, the precision override
happens after argument parsing (in export_main, via determine_precision_from_checkpoint).
However, validate_precision_runtime() is called during parse_args(), at which point
parsed_args.precisionstill holds the model's DEFAULT_PRECISION (e.g. w4), not w4a16.This causes a false "Model does not support runtime onnxruntime_genai with precision w4"
error even though w4a16 + onnxruntime_genai is a valid combination.
To Reproduce
Steps to reproduce the behavior:
python -m qai_hub_models.models.llama_v3_2_1b_instruct.export
--device "SA7255P ADP"
--skip-inferencing
--skip-profiling
--output-dir ./
--context-length 1024
--checkpoint DEFAULT_W4A16
--target-runtime onnxruntime_genai
2. It has error as invalid precision
Model does not support runtime onnxruntime_genai with precision w4. These combinations are supported:
w4: genie
w4a16: genie, onnxruntime_genai
Expected behavior
Export successful with input checkpoint W4A16
Stack trace
python -m qai_hub_models.models.llama_v3_2_1b_instruct.export
--device "SA7255P ADP"
--skip-inferencing
--skip-profiling
--output-dir ./
--context-length 1024
--checkpoint DEFAULT_W4A16
--target-runtime onnxruntime_genai
/mnt/disk1/qaihub/venv/lib/python3.10/site-packages/qai_hub_models/models/_shared/llm/model.py:91: FutureWarning: aimet_common package is deprecated since v2.20 and will be deleted in the future releases. Please directly import 👉 aimet_onnx.common 👈 instead.
import aimet_common.quantsim as qs
Unable to import cvxpy
Model does not support runtime onnxruntime_genai with precision w4. These combinations are supported:
w4: genie
w4a16: genie, onnxruntime_genai
Host configuration: