Commit a492fa9
authored
Ensure removal of temp files on error in ONNX INT4 quantization (#1359)
### What does this PR do?
Type of change: Minor bug fix
- Put quantization steps inside try-finally to ensure removal of temp
files on error in ONNX INT4 quantization.
- To avoid redundancy between awq_lite() and awq_clip() methods, created
a utility _remove_augmented_onnx() for exception-handling based removal
of augmented onnx file and its data file.
### Testing
- Locally performed ONNX INT4 awq-lite and awq-clip quantization with
Llama 1B model.
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain
why. -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A
<!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Improved reliability of the quantization pipeline by ensuring
temporary conversion artifacts are always removed, making cleanup more
robust.
* Consolidated handling of external-data companions and added safer
deletion behavior that logs failures instead of raising errors.
* Ensured consistent session teardown and forced memory collection to
reduce resource leakage and intermittent errors during model conversion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: vipandya <vipandya@nvidia.com>1 parent c07ac21 commit a492fa9
1 file changed
Lines changed: 437 additions & 408 deletions
0 commit comments