Skip to content

Commit a492fa9

Browse files
Ensure removal of temp files on error in ONNX INT4 quantization (#1359)
### What does this PR do? Type of change: Minor bug fix - Put quantization steps inside try-finally to ensure removal of temp files on error in ONNX INT4 quantization. - To avoid redundancy between awq_lite() and awq_clip() methods, created a utility _remove_augmented_onnx() for exception-handling based removal of augmented onnx file and its data file. ### Testing - Locally performed ONNX INT4 awq-lite and awq-clip quantization with Llama 1B model. ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> ### Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved reliability of the quantization pipeline by ensuring temporary conversion artifacts are always removed, making cleanup more robust. * Consolidated handling of external-data companions and added safer deletion behavior that logs failures instead of raising errors. * Ensured consistent session teardown and forced memory collection to reduce resource leakage and intermittent errors during model conversion. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: vipandya <vipandya@nvidia.com>
1 parent c07ac21 commit a492fa9

1 file changed

Lines changed: 437 additions & 408 deletions

File tree

0 commit comments

Comments
 (0)