new architecture for auto_round by n1ck-guo · Pull Request #1542 · intel/auto-round

n1ck-guo · 2026-03-13T02:08:50Z

Description

Compressor:
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
Calibration: Handles the calibration process (Work in Progress)
Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies
- ModelContext: Handles model loading and tracks model states and relevant configurations
- CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.
Algorithms: Concrete quantization and weight transformation implementations
- Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.
- Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:

from auto_round.algorithms.rotation import HadamardConfig 

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)
had_cfg_2  = HadamardConfig(hadamard_type="random_hadamard", block_size=64, random_seed=True)

compressor = Compressor(
    config=[quant_cfg, had_cfg_1, had_cfg_2], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.

Changes:

Added new context singletons (ModelContext, CompressContext) and a new compressors_new implementation path.
Expanded scheme parsing to reconcile bits/data_type and support user overrides + AutoScheme integration.
Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
auto_round/utils/model.py	Avoids runtime import cycles via `TYPE_CHECKING` for `QuantizationScheme`.
auto_round/schemes.py	Adds scheme override + parsing helpers and bits/dtype reconciliation.
auto_round/formats.py	Switches divisibility checks to global supported-layer constants.
auto_round/context/model_context.py	Introduces model lifecycle/loading + AMP setup and forward-hook management.
auto_round/context/compress_context.py	Introduces device/device_map and memory-usage knobs as shared context.
auto_round/context/base.py	Adds simple singleton context base.
auto_round/context/init.py	Package init for new `context` module.
auto_round/compressors_new/utils.py	New utility module (layer config, gguf mapping, caching helpers, forward helpers).
auto_round/compressors_new/shard_writer.py	New shard-based saver with optional safetensors support.
auto_round/compressors_new/config.py	Introduces extra/legacy config dataclasses for the new compressor path.
auto_round/compressors_new/base.py	New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop.
auto_round/compressors_new/init.py	Package init for `compressors_new`.
auto_round/compressors/utils.py	Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules.
auto_round/calibration/utils.py	Adds helpers for “early stop” caching and input reshaping for block tuning.
auto_round/calibration/init.py	Package init for `calibration`.
auto_round/algorithms/quantization/rtn/rtn.py	Adds placeholder RTN quantization module file.
auto_round/algorithms/quantization/rtn/config.py	Adds RTN algorithm config stub.
auto_round/algorithms/quantization/rtn/init.py	Package init for RTN quantization.
auto_round/algorithms/quantization/base.py	Adds base quantization class stub.
auto_round/algorithms/quantization/auto_round/quantize.py	Adds new AutoRound quantizer implementation (algorithm object).
auto_round/algorithms/quantization/auto_round/config.py	Adds new AutoRound algorithm config.
auto_round/algorithms/quantization/auto_round/init.py	Package init for AutoRound quantization algorithm.
auto_round/algorithms/quantization/init.py	Package init for quantization algorithms.
auto_round/algorithms/base.py	Adds base algorithm stub.
auto_round/algorithms/alg_config.py	Adds base algorithm config stub.
auto_round/algorithms/init.py	Package init for algorithms.

auto_round/compressors_new/utils.py

auto_round/compressors_new/base.py

auto_round/compressors_new/shard_writer.py

auto_round/algorithms/quantization/base.py

auto_round/context/model_context.py

auto_round/algorithms/quantization/auto_round/quantize.py

auto_round/algorithms/quantization/auto_round/config.py

auto_round/context/model.py

auto_round/schemes.py

auto_round/algorithms/quantization/auto_round/quantize.py

wenhuach21 · 2026-03-13T02:16:59Z

If there is already an algorithm folder, what is the purpose of the compressor folder?

auto_round/compressors_new/base.py

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

auto_round/compressors_new/base.py

auto_round/algorithms/quantization/auto_round/quantize.py

auto_round/algorithms/alg_config.py

auto_round/compressors_new/config.py

auto_round/algorithms/quantization/auto_round/quantize.py

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

yiliu30

Do we have any E2E tests for sequential quantizers?

auto_round/compressors_new/base.py

yiliu30 · 2026-04-10T07:50:54Z

auto_round/algorithms/__init__.py

@@ -0,0 +1,13 @@
+# Copyright (c) 2026 Intel Corporation


Remove this folder?

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

n1ck-guo · 2026-04-10T08:24:37Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-10T08:24:47Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 · 2026-04-10T08:51:00Z

auto_round/algorithms/quantization/rtn/quantizer.py

+        self._immediate_pack_and_save_module(name)
+
+    def _immediate_pack_and_save_module(self, module_name):
+        shard_writer = ShardWriter.get_shard_writer()


could packing and saving be decoupled from quantization process?

wenhuach21 · 2026-04-10T08:55:13Z

auto_round/algorithms/quantization/sign_round/config.py

+        enable_norm_bias_tuning (bool): Whether to enable fast norm/layer_bias tuning
+    """
+
+    _alg_cls = "SignRoundQuantizer"


Is there a better way to map these two? Would it be better to provide a clear function that developers are required to implement?

wenhuach21

Thank you very much for the great effort!

wenhuach21 · 2026-04-10T08:56:03Z

auto_round/algorithms/quantization/sign_round/config.py

+        dynamic_max_gap: int = -1,
+        enable_quanted_input: bool = True,
+        optimizer: str = None,
+        enable_adam: bool = False,


as adam is decoupled, could we remove this argument from the config

wenhuach21 · 2026-04-10T08:58:27Z

auto_round/algorithms/quantization/base.py

+    # Subclasses that support diffusion models should override this with the
+    # appropriate output key mapping, e.g.:
+    #   DIFFUSION_OUTPUT_CONFIGS = {"FluxTransformerBlock": ["encoder_hidden_states", "hidden_states"]}
+    DIFFUSION_OUTPUT_CONFIGS: dict = {}


this argument should be added to the AutoRound interface instead of this one

wenhuach21 · 2026-04-10T09:00:19Z

auto_round/algorithms/quantization/base.py

+
+    @property
+    def amp_dtype(self):
+        import torch


amp is only for tuning algorithms, so it's better to refine it. No need to refine it in this pr

wenhuach21 · 2026-04-10T09:04:45Z

auto_round/algorithms/quantization/base.py

+
+        return getattr(self.model_context, "amp_dtype", torch.float32)
+
+    def _register_act_max_hook(self, model):


we should provide an interface to support customized hooks and should not register act_max_hook by default, which is not required by most algortihm

wenhuach21 · 2026-04-10T09:11:52Z

auto_round/algorithms/quantization/base.py

+
+    @torch.inference_mode()
+    def _quantize_embedding_layer(self):
+        """Quantizes embedding layers in the model according to the configuration.


To align the function with other funcitons, this one should be changed to _quantize_embedding_layer(self, layer), and this one should also be designed to be overridden by subclasses. If it's difficult, feel free to support it in the futhure

wenhuach21 · 2026-04-10T09:14:29Z

auto_round/algorithms/quantization/base.py

+        output keys.  Subclasses override ``DIFFUSION_OUTPUT_CONFIGS`` to add
+        support for new diffusion architectures.
+        """
+        output = defaultdict(list)


I prefer to move this one to utils and decouple the quantizer from model types

auto_round/algorithms/quantization/config.py

n1ck-guo · 2026-04-11T05:18:45Z

This PR will not make any further feature changes. I will collect all relevant comments and then modify them in future PRs.

n1ck-guo · 2026-04-11T09:52:12Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-11T09:52:23Z

Azure Pipelines successfully started running 1 pipeline(s).

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…ntext init - _hardware_setup: apply act-quantize/alg-ext guard before compile_func, matching _resolve_block_forward() and old-arch behavior. On HPU where enable_torch_compile stays True for FP8_STATIC, this avoids creating a compiled graph that wastes ~264 MB of HPU memory. - ModelContext.__init__: gc.collect + malloc_trim after model/tokenizer loading to reclaim C heap fragmentation (~96 MB). Signed-off-by: n1ck-guo <heng.guo@intel.com>

…init reorder - Add _force_trim_malloc() in device.py that unconditionally calls malloc_trim(0), bypassing the counter-based throttle in _maybe_trim_malloc() which was skipping critical lifecycle trim points - ClearMemory HPU path: replace _maybe_trim_malloc() with _force_trim_malloc() so heap pages are reclaimed before each MemoryMonitor RSS sample, preventing inflated peak_ram readings - ModelContext._load_model: add gc.collect + _force_trim_malloc before llm_load_model to reclaim temporary HTTP/config objects from is_mllm_model/is_diffusion_model/AutoConfig.from_pretrained calls - ModelContext.__init__: use _force_trim_malloc at end so the trim actually fires (previously _maybe_trim_malloc was a no-op at counter=1) - BaseCompressor.__init__: reorder context creation so ModelContext (large model allocation) is created before CompressContext (small), matching OLD arch allocation order to reduce heap fragmentation - BaseCompressor.post_init: add gc.collect + _force_trim_malloc after the five init phases to start quantize loop from tighter baseline - CalibCompressor.quantize: use _force_trim_malloc at loop start

xin3he

LGTM, please get the approval from Wenhua and Liang.

init

7698b93

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo requested review from Copilot, lkk12014402, lvliang-intel, wenhuach21 and xin3he March 13, 2026 02:08

n1ck-guo added the draft label Mar 13, 2026

Copilot started reviewing on behalf of n1ck-guo March 13, 2026 02:09 View session

n1ck-guo added the engineering label Mar 13, 2026

Copilot AI reviewed Mar 13, 2026

View reviewed changes

wenhuach21 reviewed Mar 13, 2026

View reviewed changes

auto_round/compressors_new/base.py Outdated Show resolved Hide resolved

n1ck-guo requested review from WeiweiZhang1 and yiliu30 and removed request for xin3he March 13, 2026 05:31

n1ck-guo added 3 commits March 13, 2026 14:00

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

75b4141

…uo/new_ar_arch

update

ca17097

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

a092e37

…uo/new_ar_arch

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/compressors_new/base.py Outdated Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/algorithms/quantization/auto_round/quantize.py Outdated Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/algorithms/alg_config.py Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/compressors_new/config.py Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/algorithms/quantization/auto_round/quantize.py Outdated Show resolved Hide resolved

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

cec4ce4

…uo/new_ar_arch

chensuyue added this to the 0.12.0 milestone Mar 16, 2026

This was referenced Mar 17, 2026

decouple quanitzers #787

Open

Refactor collection for v0.13.0 release #1134

Open

n1ck-guo and others added 3 commits March 17, 2026 17:02

update

e265b8f

Signed-off-by: n1ck-guo <heng.guo@intel.com>

merge main

868a82d

Signed-off-by: n1ck-guo <heng.guo@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9dc930c

for more information, see https://pre-commit.ci

performance

0025256

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo added the ready only add when the PR is ready to merge label Apr 8, 2026

n1ck-guo requested review from lkk12014402 and lvliang-intel April 8, 2026 07:56

n1ck-guo added 5 commits April 9, 2026 10:21

fix

1831126

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

8873eca

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

a1a4244

Signed-off-by: n1ck-guo <heng.guo@intel.com>

preformance

bd75536

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

f2940bd

…uo/new_ar_arch

n1ck-guo requested a review from wenhuach21 April 10, 2026 07:19

sync

e4ce420

Signed-off-by: n1ck-guo <heng.guo@intel.com>

yiliu30 reviewed Apr 10, 2026

View reviewed changes

n1ck-guo added 2 commits April 10, 2026 16:06

fix

1286749

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

9914306

…uo/new_ar_arch

performance

5c212b5

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 reviewed Apr 10, 2026

View reviewed changes

n1ck-guo added 6 commits April 11, 2026 18:06

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

550158b

…uo/new_ar_arch

performance

4806d5a

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

1f1fbd9

Signed-off-by: n1ck-guo <heng.guo@intel.com>

update

e4fdfe6

Signed-off-by: n1ck-guo <heng.guo@intel.com>

xin3he reviewed Apr 13, 2026

View reviewed changes


		return getattr(self.model_context, "amp_dtype", torch.float32)

		def _register_act_max_hook(self, model):

Conversation

n1ck-guo commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiliu30 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

n1ck-guo commented Apr 10, 2026

Uh oh!

azure-pipelines bot commented Apr 10, 2026

Uh oh!

wenhuach21 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 left a comment

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

n1ck-guo commented Apr 11, 2026

Uh oh!

n1ck-guo commented Apr 11, 2026

Uh oh!

azure-pipelines bot commented Apr 11, 2026

Uh oh!

xin3he left a comment

Choose a reason for hiding this comment

n1ck-guo commented Mar 13, 2026 •

edited

Loading

wenhuach21 Apr 10, 2026 •

edited

Loading

wenhuach21 Apr 10, 2026 •

edited

Loading