Open
Description
Is your feature request related to a problem? Please describe.
During the internal call to our export functions for torch/onnx, we have multiple forward passes that are executed to perform caching of quant metadata.
Describe the solution you'd like
We have flags that use to enable/disable caching. The idea would be to always enable caching (in eval mode), and remove the need of extra forward passes.
Check if this has any meaningful impact on execution time.