[Compressors] Remove sparse compression by kylesayrs · Pull Request #2452 · vllm-project/llm-compressor

kylesayrs · 2026-03-07T03:47:55Z

Purpose

Remove sparsity tests which fail now that sparse compression is no longer supported

Corequisites

[Compressors] Refactor compressors, remove sparsity & CompressedLinear compressed-tensors#610

Testing

https://github.com/neuralmagic/llm-compressor-testing/actions/runs/22814276395

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2026-03-07T03:48:02Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-03-07T03:48:19Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request deprecates and removes all support for sparse24 compression within the llmcompressor library. The changes involve removing the sparse24 compression logic, refactoring the compress_module function to a more centralized location, and updating the model compression utilities to reflect that sparse compression is no longer supported. This streamlines the codebase by removing an unsupported feature and simplifies the compression pipeline.

Highlights

Removed Sparse24 Compression: The sparse24 compression scheme and all related logic, including the compress_module function, have been entirely removed from the codebase.
Refactored Compression Utility: The compress_module function was moved from llmcompressor.entrypoints.model_free.lifecycle to compressed_tensors.compressors, centralizing its definition.
Updated Model Compressor: The get_model_compressor utility now explicitly warns that sparse compression is no longer supported by compressed-tensors and has removed all logic for handling sparsity configurations.
Streamlined Testing: Extensive tests related to sparse24 compression, sparse model reloading, and compressor stacking have been removed or simplified to reflect the deprecation of sparse compression.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/llmcompressor/entrypoints/model_free/lifecycle.py
- Removed BaseCompressor and _get_quant_compression_format imports.
- Removed compress_module from the module's __all__ export list.
- Deleted the compress_module function definition.
src/llmcompressor/entrypoints/model_free/process.py
- Added compress_module import from compressed_tensors.compressors.
- Removed compress_module import from llmcompressor.entrypoints.model_free.lifecycle.
src/llmcompressor/transformers/compression/compressed_tensors_utils.py
- Removed get_state_dict_offloaded_model and SparsityConfigMetadata imports.
- Modified get_model_compressor to remove all sparse compression handling logic and issue a warning.
- Removed sparsity_config_or_format argument from ModelCompressor.from_pretrained_model calls.
tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py
- Removed imports related to sparse compression (math, ModelCompressor, CompressionFormat, BitmaskConfig, DenseSparsityConfig).
- Added infer_set_module_formats import.
- Deleted test_sparse_model_reload, test_dense_model_save, test_compressor_stacking, test_sparse_24_compressor_is_lossless, and test_disable_sparse_compression_flag test functions.
- Simplified test_quant_model_reload parametrization and logic.
- Removed _make_24_sparse helper function.
- Updated DummyLinearModel initialization for weight_scale and zero_point.
- Refactored test_correct_compressor_inferred to remove sparse compression checks and use infer_set_module_formats.

Activity

No activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request removes support for sparse24 compression, which is a significant change. The modifications are consistent across the codebase, including removing logic from the main library and updating/removing relevant tests. The changes are well-aligned with the goal of deprecating this feature. I've found a minor issue in the test suite that could be improved for clarity.

gemini-code-assist · 2026-03-07T03:49:44Z

tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py

 @pytest.mark.parametrize(
-    "quant_style, quant_type, is_24, expected_quant_compressor, "
-    "expected_sparsity_compressor",
+    "quant_style,quant_type,is_24,expected_format",
    [
-        ("W8A8", "int", False, "int-quantized", "dense"),
-        ("W4A16", "int", False, "pack-quantized", "dense"),
-        ("W8A16", "int", False, "pack-quantized", "dense"),
-        ("W8A8", "int", True, "int-quantized", "sparse-24-bitmask"),
-        ("W4A16", "int", True, "marlin-24", "dense"),
-        ("W8A16", "int", True, "marlin-24", "dense"),
-        ("W8A8", "float", False, "float-quantized", "dense"),
-        ("W8A16", "float", False, "naive-quantized", "dense"),
-        ("W8A8", "float", True, "float-quantized", "sparse-24-bitmask"),
-        ("W8A16", "float", True, "naive-quantized", "dense"),
+        ("W8A8", "int", False, "int-quantized"),
+        ("W4A16", "int", False, "pack-quantized"),
+        ("W8A16", "int", False, "pack-quantized"),
+        ("W8A8", "float", False, "float-quantized"),
+        ("W8A16", "float", False, "naive-quantized"),
+        ("W8A16", "float", True, "naive-quantized"),
    ],
 )
 def test_correct_compressor_inferred(
    quant_style,
    quant_type,
    is_24,
-    expected_quant_compressor,
-    expected_sparsity_compressor,
+    expected_format,
 ):


The is_24 parameter appears to be a leftover from refactoring and is no longer used within the test body. This can be confusing for future readers of the code. To improve clarity and maintainability, it should be removed from the test function signature and the pytest.mark.parametrize decorator. The test cases should also be updated to remove the now-redundant parameter and a duplicate test case.

@pytest.mark.parametrize( "quant_style,quant_type,expected_format", [ ("W8A8", "int", "int-quantized"), ("W4A16", "int", "pack-quantized"), ("W8A16", "int", "pack-quantized"), ("W8A8", "float", "float-quantized"), ("W8A16", "float", "naive-quantized"), ], ) def test_correct_compressor_inferred( quant_style, quant_type, expected_format, ):

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

brian-dellabetta

🧹

mergify · 2026-03-10T17:18:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

HDCharles · 2026-03-11T21:20:10Z

tests/llmcompressor/transformers/compression/decompression_configs/fp8_dynamic.yaml

 test_type: "regression"
-compressed_model_stub: "nm-testing/tinyllama-fp8-dynamic-compressed"
-skeleton_model_stub: "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
+compressed_model_stub: "nm-testing/tinyllama-fp8-dynamic-compressed"


Why are these all showing up as line changes? Did you remove the new line?

HDCharles

Seems fine though you should fix those newlines

initial commit

b9272df

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs requested review from HDCharles, brian-dellabetta and dsikka as code owners March 7, 2026 03:47

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

kylesayrs mentioned this pull request Mar 7, 2026

[Compressors] Refactor compressors, remove sparsity & CompressedLinear vllm-project/compressed-tensors#610

Open

kylesayrs changed the title ~~[Compressors] Remove sparse24 compression~~ [Compressors] Remove sparse compression Mar 7, 2026

kylesayrs added 2 commits March 7, 2026 00:48

use new name

e6a9c97

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

update test

166729a

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

brian-dellabetta approved these changes Mar 10, 2026

View reviewed changes

mergify bot added the needs-rebase label Mar 10, 2026

rahul-tuli approved these changes Mar 11, 2026

View reviewed changes

HDCharles reviewed Mar 11, 2026

View reviewed changes

HDCharles approved these changes Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Compressors] Remove sparse compression#2452

[Compressors] Remove sparse compression#2452
kylesayrs wants to merge 3 commits intomainfrom
kylesayrs/remove-sparse-compression

kylesayrs commented Mar 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

brian-dellabetta left a comment

Uh oh!

mergify bot commented Mar 10, 2026

Uh oh!

HDCharles Mar 11, 2026

Uh oh!

HDCharles left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kylesayrs commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Corequisites

Testing

Uh oh!

github-actions bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 10, 2026

Uh oh!

HDCharles Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kylesayrs commented Mar 7, 2026 •

edited

Loading