Conversation
dineshchitlangia
left a comment
There was a problem hiding this comment.
Overall, LGTM.
Minor question pls:
export_manager.change_weight_export(export_weight_q_node=True)
In your change, how are we accounting for this weight export?
|
Last time I discussed with @fxmarty, he mentioned that it would be always preferable to export only Integer Weights -> DQ rather than Float Weights -> Q -> DQ, so this last option has been completely removed in this PR. This is also related to the changes in #82 , where we assume that the weights have been exported without the Q node. |
Thanks for the clarity @Giuseppe5 I do not have write privileges to merge your changes so you will have to wait a bit more until someone can merge it. |
| from optimum.amd.brevitas.accelerate_utils import calc_cpu_device_map, calc_gpu_device_map, offload_model, remove_hooks | ||
| from optimum.amd.brevitas.data_utils import compute_perplexity, get_dataset_for_model | ||
| from optimum.exporters.onnx import onnx_export_from_model | ||
| from optimum.amd.brevitas.export import export_quantized_model |
There was a problem hiding this comment.
| from optimum.amd.brevitas.export import export_quantized_model | |
| from optimum.amd.brevitas.export import export_to_onnx |
Can we keep the ONNX word in the loop to make it explicit. Other name suggestions, quantized_model_to_onnx or save_quantized_model_as_onnx
There was a problem hiding this comment.
I opted to keep as similar as possible to the original name so it became:
onnx_export_from_quantized_model
|
Could you also document the same in |
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
* Feat: export dq only * fix * fix * Code review * Docs: update documentation * Formatting * Apply suggestions from code review Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> --------- Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Supersedes #94
The idea is to export only Integer weights + DQ. For this, we need to use PyTorch 2.2+ because of a bug in how constant values are handled at export time.