Skip to content

[Compressors] Refactor compressors, remove sparsity & CompressedLinear#610

Open
kylesayrs wants to merge 1 commit intomainfrom
kylesayrs/compressor-refactor-claude
Open

[Compressors] Refactor compressors, remove sparsity & CompressedLinear#610
kylesayrs wants to merge 1 commit intomainfrom
kylesayrs/compressor-refactor-claude

Conversation

@kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Mar 2, 2026

Purpose

  • Remove complexity related to supporting sparse compression
  • Remove complexity related to CompressedLinear
  • Define an easy-to-use api for both module and state dict compression/decompression
  • Prepare to support distributed parallel compression

Corequisites

Entrypoints

Compressed Tensors has the following entrypoints into compression:

Name Location Purpose Used code paths
ModelCompressor.from_compression_config Transformers Load a HF::CompressedTensorsConfig representing a model config. Used when loading a compressed model using transformers for inference. ModelCompressor.quantization_config, ModelCompressor.compress_model ,ModelCompressor.decompress_model
ModelCompressor.from_compression_config vLLM - Cutlass24 Load a dict[str, Any] representing a model config. The layer is decompressed using ours, the recompressed using ops.cutlass_sparse_compress , to be recompressed later using a better format. ModelCompressor.sparsity_config.format, ModelCompressor.sparsity_compressor.decompress_weight
ModelCompressor.from_pretrained_model LLM Compressor Compresses a model before saving it so that the data format and disk space is optimal for use by inference kernels. ModelCompressor..compress_model, oModelCompressor..update_config

This PR removes support for (2), as vLLM will no longer support 24 sparsity in the future. The functionality of the other two entrypoints remains unchanged.

Changes

  • Simplify compressors
    • Remove concept of "quantization" and "sparsity" compressors
    • Each format has exactly one compressor, and vice versa. Compressors define which quantization schemes they can support, and modules are compressed using whichever compressor supports them, in order of a defined priority.
    • Modules can be compressed via Compressor.compress_module() if the format is known, or compress_module() if the format should be inferred
  • Remove sparsity
    • Remove sparsity compressors
    • Deprecate sparsity-related config arguments
    • Remove (very) out of date examples referring to sparsity
  • Remove CompressedLinear
    • Instead, the ModelCompressor adds a pre_forward hook to the model which triggers decompression on the first forward pass
    • The model's status changed to QuantizationStatus.DECOMPRESSED for efficient inference
    • Add new QuantizationStatus.DECOMPRESSED, which defines what was previously implicitly defined: the state where CompressedLinear had decompressed itself, and runs forward passes without any weight qdq.
      • As a side note, I believe that the original CompressedLinear was actually broken, in that it did not actually perform activation quantization
      • This status is distinct from QuantizationStatus.FROZEN in that FROZEN will still perform weight qdq during forward pass (in order to create correct emulation), but DECOMPRESSED does not need to perform additional weight qdq (because the weight has already been qdqed permanently). See the documentation for QuantizationStatus

Testing

Follow-ups

  • Add up-to-date documentation for compressed tensors
  • Add distributed parallelized compression
  • Add dequantize() method implementation on transformers
  • (optional) greater cleanup for removing sparsity

@mergify
Copy link

mergify bot commented Mar 2, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 2, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 2, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 3, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 3, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase documentation Improvements or additions to documentation and removed quality-failed needs-rebase labels Mar 5, 2026
@kylesayrs kylesayrs force-pushed the kylesayrs/compressor-refactor-claude branch from 071b11b to e41e147 Compare March 5, 2026 23:32
@kylesayrs kylesayrs changed the title [WIP] Refactor compressors [Compressors] Refactor compressors, remove sparsity & CompressedLinear Mar 6, 2026
@mergify
Copy link

mergify bot commented Mar 6, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Mar 7, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 8, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 9, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

@mergify
Copy link

mergify bot commented Mar 9, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 9, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/compressor-refactor-claude branch from 215e6cf to 9e7435b Compare March 9, 2026 19:41
@mergify mergify bot removed the needs-rebase label Mar 9, 2026
@mergify
Copy link

mergify bot commented Mar 9, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

@brian-dellabetta brian-dellabetta Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we either change these to git mved files rather than deleted/created, or do the quantized_compressors folder re-org in a separate PR? This diff is rather unwieldy in its current form

Image

Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this in a screenshare with Kyle. I like the changes and the code looks a lot cleaner, but there's a lot in this PR. it would be good to run e2e and example tests.

Approving with a handful of nits

This method iterates over the dense_weight_generator and
updates the corresponding weights in the model. If a parameter
name does not exist in the model, it will be skipped.
The hook automatically removes itself after decompression, allowing the model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit -- this tripped me up a bit when reviewing. I don't see any code in the hook to do this, but it does live on .decompress_model.

Suggested change
The hook automatically removes itself after decompression, allowing the model
The hook is automatically removed after decompression, allowing the model

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called out in the code comment

# decompress_model already removes the hook via remove_decompression_hook

return state_dict

@classmethod
def match(cls, module_type: type, scheme: QuantizationScheme) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit -- can we prefix with is_ to indicate it works a little different than our other match_ functions and returns a bool?

Suggested change
def match(cls, module_type: type, scheme: QuantizationScheme) -> bool:
def is_match(cls, module_type: type, scheme: QuantizationScheme) -> bool:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


# in compressed mode, the weight is already compressed and quantized so we don't
# need to run fake quantization
# TODO: remove this line, as this is already guarded by `set_forward_quantized`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️


# force zero points during initialization
force_zero_point = config.quantization_status != QuantizationStatus.COMPRESSED
# TODO: remove zero points from initialization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you had this as a TODO on another line. I think this is better served as a first good issue than a TODO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be both

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this for example is a clear case where git mv should be done. it is helpful to retain the git history as much as possible

Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff looks good, pending merge conflicts, However I agree that this is too big a change to review in one PR

return state_dict

@classmethod
def match(cls, module_type: type, scheme: QuantizationScheme) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename to is_match?

return state_dict

@classmethod
def match(cls, module_type: type, scheme: QuantizationScheme) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants