Differentiable “Self-Compression” as an Optional Training-Time Feature #3810
-
|
I wanted to float an idea for an experimental training-time compression feature that could sit alongside existing PTQ and QAT workflows in NNCF. The core idea is self-compression: instead of manually configuring mixed precision, sparsity schedules, or multi-stage compression pipelines, the model learns its own optimal bit-widths and channel usage during training via gradients. What this adds (at a high level)
Implementation-wise, this could live as a new From a user perspective, this becomes a more “set-and-forget” option:
Does this sound like something that could be worked on? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
@ljaljushkin , please take a look, |
Beta Was this translation helpful? Give feedback.
-
|
Greetings, @stktyagi! Thank you for the idea, it looks interesting! We have other potential directions.
|
Beta Was this translation helpful? Give feedback.
Greetings, @stktyagi!
Thank you for the idea, it looks interesting!
Gradient-based methods, unfortunately, can be quite costly in terms of both time and memory. We are focusing on training-free methods now.
We have other potential directions.
Currently, it’s only magnitude based, but there’s no 2:4 or m:n sparsity, which is opportunity for contribution.
It w…