Replies: 1 comment
-
|
@ljaljushkin , please take a look, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to float an idea for an experimental training-time compression feature that could sit alongside existing PTQ and QAT workflows in NNCF.
The core idea is self-compression: instead of manually configuring mixed precision, sparsity schedules, or multi-stage compression pipelines, the model learns its own optimal bit-widths and channel usage during training via gradients.
What this adds (at a high level)
Learnable bit-widths$b$ ) is a trainable parameter. This gives users an automated alternative to hand-crafted mixed-precision setups.
Introduce a differentiable quantizer where bit-depth (
Single unified compression objective
Add a
SelfCompressionLossterm that penalizes total network bit-count. This naturally pushes the optimizer toward both quantization and pruning in one pass.A modern alternative to deprecated structural pruning
Channel-level elimination happens implicitly through gradients rather than explicit pruning schedules, which feels more aligned with current training practices.
Size-targeted optimization$\gamma$ ), letting the model discover the best weight-to-bit tradeoff on its own.
Users can control aggressiveness via a single size/memory penalty (
Minimal disruption to existing workflows
This could be opt-in, experimental, and designed to coexist cleanly with PTQ and QAT.
Implementation-wise, this could live as a new
DifferentiableQuantizerand a correspondingCompressionAlgorithm.From a user perspective, this becomes a more “set-and-forget” option:
Does this sound like something that could be worked on?
Beta Was this translation helpful? Give feedback.
All reactions