Skip to content

Add actorder support for GPTQ block quantization#2616

Open
rk119 wants to merge 16 commits intovllm-project:mainfrom
rk119:actorder-for-block
Open

Add actorder support for GPTQ block quantization#2616
rk119 wants to merge 16 commits intovllm-project:mainfrom
rk119:actorder-for-block

Conversation

@rk119
Copy link
Copy Markdown

@rk119 rk119 commented Apr 14, 2026

SUMMARY:

TEST PLAN:

  • Add more relevant recipes in the tests to ensure that the changes do not introduce inconsistencies or errors.

rk119 added 5 commits April 14, 2026 14:54
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: db9fdd78-180b-4c66-8eaf-4fbc1293aea4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The PR extends GPTQ's g_idx (group index) handling to support BLOCK quantization strategy alongside existing GROUP/TENSOR_GROUP strategies. Column indexing logic is consolidated by using g_idx for all strategies, enabling weight activation ordering for fp8 block quantization. New test recipes validate block-based 8-bit configurations.

Changes

Cohort / File(s) Summary
GPTQ Quantization Logic
src/llmcompressor/modifiers/gptq/gptq_quantize.py
Extended g_idx generation and persistence to BLOCK strategy. Changed computation to use divisor from group_size (GROUP/TENSOR_GROUP) or block_structure[1] (BLOCK). Updated BLOCK quantization to derive block_column_idx from g_idx instead of recalculating. Reworked activation-ordering restoration with g_idx_to_save variable, saving g_idx only when actorder == ActivationOrdering.GROUP.
GPTQ Block Quantization Tests
tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py
Added two new GPTQModifier recipe variants for block-based quantization with block_structure=[2, 8] and 8-bit precision: one without activation ordering and one with ActivationOrdering.WEIGHT. Updated test assertion to allow either 4 or 8 bits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

gptq, fp8, refactor, enhancement

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add actorder support for GPTQ block quantization' directly summarizes the main change: enabling activation ordering for block quantization in GPTQ.
Linked Issues check ✅ Passed The PR fully addresses issue #2587 by consolidating block indexing to use g_idx, enabling activation ordering for block quantization without duplicating logic.
Out of Scope Changes check ✅ Passed All changes are directly within scope: gptq_quantize.py modifications support block quantization with actorder, and test updates validate the new functionality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description clearly describes the changeset: reusing g_idx for block quantization, removing has_gidx, and simplifying g_idx save logic, which aligns with the actual code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the BLOCK quantization strategy within the GPTQ modifier. Key changes include updating the quantize_weight function to handle block-based divisors for group indices, refactoring the activation ordering logic to accommodate block structures, and ensuring group indices are correctly saved when required. Additionally, new test cases were added to verify the block quantization variant with different activation orderings. I have no feedback to provide as there are no review comments to evaluate.

@HDCharles HDCharles added the ready When a PR is ready for review label Apr 14, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 14, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md


if not has_gidx:
g_idx = None
if actorder == ActivationOrdering.GROUP:
Copy link
Copy Markdown
Collaborator

@HDCharles HDCharles Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this, only group act order saves g_idx, can just check for that
No need to check twice, just do this line on line 287 and remove g_idx_to_save

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done!

@rk119 rk119 marked this pull request as ready for review April 14, 2026 13:31
Comment thread src/llmcompressor/modifiers/gptq/gptq_quantize.py Outdated
@coderabbitai coderabbitai Bot added enhancement New feature or request gptq For any PR / issue related to GPTQ support fp8 For any issue / PR related to FP8 support Refactor Code cleanup and/or improvements to existing features labels Apr 14, 2026
Comment thread src/llmcompressor/modifiers/gptq/gptq_quantize.py
Comment thread src/llmcompressor/modifiers/gptq/gptq_quantize.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py (1)

180-188: ⚠️ Potential issue | 🟡 Minor

The assertions don't pin the new BLOCK behavior yet.

Relaxing this to 4 or 8 means a deserialization fallback to any other 8-bit scheme would still pass. For the BLOCK recipes, assert weight_args.strategy and weight_args.block_structure too so this test actually verifies the path added by the PR.

As per coding guidelines, "tests/**/*.py: Ensure PyTest tests are clear, comprehensive, and cover edge cases for quantization scenarios."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py` around lines 180
- 188, The test currently only checks num_bits for weight_args allowing 4 or 8
which doesn't ensure the new BLOCK recipe path; update the assertions after
obtaining weight_args (from quantization_config.config_groups["group_0"]) to
also assert that weight_args.strategy equals the BLOCK strategy name used in
your codebase (e.g., "BLOCK") and that weight_args.block_structure matches the
expected structure (e.g., a tuple/list or specific object describing block dims)
so the test verifies both strategy and block_structure in addition to num_bits;
locate symbols quantization_config, quant_scheme, weight_args, QuantizationArgs
and add assertions for weight_args.strategy and weight_args.block_structure
accordingly.
src/llmcompressor/modifiers/gptq/gptq_quantize.py (1)

263-287: ⚠️ Potential issue | 🟠 Major

BLOCK strategy with activation ordering creates an untested, unsupported code path with cross-repository incompatibility.

When strategy == QuantizationStrategy.BLOCK and actorder == ActivationOrdering.GROUP, the code saves weight_g_idx (line 275). However:

  1. Incompatibility with compressed-tensors initialize.py: g_idx is only registered as a parameter for GROUP/TENSOR_GROUP strategies; BLOCK never registers {base_name}_g_idx, so when loading, this saved g_idx cannot be restored to a module parameter.

  2. Missing g_idx handling in calibration flatten: _flatten_weight() in src/llmcompressor/observers/helpers.py:75-80 only applies g_idx reordering for GROUP/TENSOR_GROUP strategies. For BLOCK, g_idx is ignored during flattening, even though it is passed from imatrix.py:165.

  3. No test coverage: There are no tests combining BLOCK strategy with any ActivationOrdering, making this a completely untested path.

This indicates a design decision was incomplete: either BLOCK should not save g_idx when using activation ordering, or initialize.py and _flatten_weight should both be updated to support g_idx for BLOCK (with proper divisor-based grouping).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llmcompressor/modifiers/gptq/gptq_quantize.py` around lines 263 - 287,
The BLOCK strategy currently writes weight_g_idx when actorder ==
ActivationOrdering.GROUP (in gptq_quantize.py) which is incompatible because
compressed-tensors initialize.py does not register a {base_name}_g_idx for BLOCK
and _flatten_weight (helpers._flatten_weight) ignores g_idx for BLOCK; either
stop emitting weight_g_idx for QuantizationStrategy.BLOCK when using
ActivationOrdering.GROUP (remove the g_idx_to_save assignment and related
q_param_dict entry) or fully add BLOCK support: update initialize.py to register
{base_name}_g_idx for BLOCK with the correct divisor/grouping logic, extend
helpers._flatten_weight to apply g_idx reordering for BLOCK like
GROUP/TENSOR_GROUP, ensure imatrix.py passes g_idx consistently, and add tests
covering BLOCK combined with ActivationOrdering variations to validate
serialization and calibration paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py`:
- Around line 99-127: The two test recipes recipe_modifier_full_block and
recipe_modifier_block_actorder_weight currently construct QuantizationArgs
without specifying type, so they default to QuantizationType.INT; update both
QuantizationArgs in the GPTQModifier config_groups to include
type=QuantizationType.FLOAT so the tests exercise FP8 block quantization
behavior (locate the QuantizationArgs instances inside GPTQModifier for group_0
targeting "re:.*model.layers.2.self_attn.q_proj$" and add the type field).

---

Outside diff comments:
In `@src/llmcompressor/modifiers/gptq/gptq_quantize.py`:
- Around line 263-287: The BLOCK strategy currently writes weight_g_idx when
actorder == ActivationOrdering.GROUP (in gptq_quantize.py) which is incompatible
because compressed-tensors initialize.py does not register a {base_name}_g_idx
for BLOCK and _flatten_weight (helpers._flatten_weight) ignores g_idx for BLOCK;
either stop emitting weight_g_idx for QuantizationStrategy.BLOCK when using
ActivationOrdering.GROUP (remove the g_idx_to_save assignment and related
q_param_dict entry) or fully add BLOCK support: update initialize.py to register
{base_name}_g_idx for BLOCK with the correct divisor/grouping logic, extend
helpers._flatten_weight to apply g_idx reordering for BLOCK like
GROUP/TENSOR_GROUP, ensure imatrix.py passes g_idx consistently, and add tests
covering BLOCK combined with ActivationOrdering variations to validate
serialization and calibration paths.

In `@tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py`:
- Around line 180-188: The test currently only checks num_bits for weight_args
allowing 4 or 8 which doesn't ensure the new BLOCK recipe path; update the
assertions after obtaining weight_args (from
quantization_config.config_groups["group_0"]) to also assert that
weight_args.strategy equals the BLOCK strategy name used in your codebase (e.g.,
"BLOCK") and that weight_args.block_structure matches the expected structure
(e.g., a tuple/list or specific object describing block dims) so the test
verifies both strategy and block_structure in addition to num_bits; locate
symbols quantization_config, quant_scheme, weight_args, QuantizationArgs and add
assertions for weight_args.strategy and weight_args.block_structure accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cf4ada2-ab75-44d8-b52d-d5b43b6187e9

📥 Commits

Reviewing files that changed from the base of the PR and between 9d328ed and 89642b2.

📒 Files selected for processing (2)
  • src/llmcompressor/modifiers/gptq/gptq_quantize.py
  • tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py

Comment thread tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py Outdated
@HDCharles
Copy link
Copy Markdown
Collaborator

See quality checks instructions and bot comments, otherwise looks pretty good just some minor fixes.

We should also run some evals for this, once you have the changes in I can help you with that. You can reach out to me in vLLM slack for easier coordination

rk119 added 3 commits April 14, 2026 18:08
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
@mergify mergify Bot removed the quality-failed label Apr 14, 2026
Signed-off-by: Riffat Khan <riffatk342@gmail.com>
@HDCharles HDCharles self-assigned this Apr 15, 2026
@mergify mergify Bot added the two-reviews When a PR requires two reviews label Apr 22, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 22, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for:

  • #approved-reviews-by >= 2
  • #changes-requested-reviews-by = 0
This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

  • #approved-reviews-by >= 2
  • #changes-requested-reviews-by = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request fp8 For any issue / PR related to FP8 support gptq For any PR / issue related to GPTQ support ready When a PR is ready for review Refactor Code cleanup and/or improvements to existing features two-reviews When a PR requires two reviews

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add weight activation ordering for fp8 block

2 participants