Add actorder support for GPTQ block quantization by rk119 · Pull Request #2616 · vllm-project/llm-compressor

rk119 · 2026-04-14T11:01:55Z

SUMMARY:

Depends on: Restrict group activation ordering to group quantization strategies compressed-tensors#682
Closes: Add weight activation ordering for fp8 block #2587
Replace the separate block_column_idx calculation in block quantization by reusing g_idx, first getting the divisor from either the group size value or the block width value. This should automatically ensure the use of actorder.
Remove has_gidx from and simplified the logic to save g_idx if ActivationOrdering.GROUP is opted for.

TEST PLAN:

Add more relevant recipes in the tests to ensure that the changes do not introduce inconsistencies or errors.

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

coderabbitai · 2026-04-14T11:02:02Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: db9fdd78-180b-4c66-8eaf-4fbc1293aea4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

The PR extends GPTQ's g_idx (group index) handling to support BLOCK quantization strategy alongside existing GROUP/TENSOR_GROUP strategies. Column indexing logic is consolidated by using g_idx for all strategies, enabling weight activation ordering for fp8 block quantization. New test recipes validate block-based 8-bit configurations.

Changes

Cohort / File(s)	Summary
GPTQ Quantization Logic `src/llmcompressor/modifiers/gptq/gptq_quantize.py`	Extended `g_idx` generation and persistence to `BLOCK` strategy. Changed computation to use divisor from `group_size` (GROUP/TENSOR_GROUP) or `block_structure[1]` (BLOCK). Updated BLOCK quantization to derive `block_column_idx` from `g_idx` instead of recalculating. Reworked activation-ordering restoration with `g_idx_to_save` variable, saving `g_idx` only when `actorder == ActivationOrdering.GROUP`.
GPTQ Block Quantization Tests `tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py`	Added two new `GPTQModifier` recipe variants for block-based quantization with `block_structure=[2, 8]` and 8-bit precision: one without activation ordering and one with `ActivationOrdering.WEIGHT`. Updated test assertion to allow either 4 or 8 bits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

gptq, fp8, refactor, enhancement

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add actorder support for GPTQ block quantization' directly summarizes the main change: enabling activation ordering for block quantization in GPTQ.
Linked Issues check	✅ Passed	The PR fully addresses issue `#2587` by consolidating block indexing to use g_idx, enabling activation ordering for block quantization without duplicating logic.
Out of Scope Changes check	✅ Passed	All changes are directly within scope: gptq_quantize.py modifications support block quantization with actorder, and test updates validate the new functionality.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	The pull request description clearly describes the changeset: reusing g_idx for block quantization, removing has_gidx, and simplifying g_idx save logic, which aligns with the actual code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-14T11:02:04Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist

Code Review

This pull request introduces support for the BLOCK quantization strategy within the GPTQ modifier. Key changes include updating the quantize_weight function to handle block-based divisors for group indices, refactoring the activation ordering logic to accommodate block structures, and ensuring group indices are correctly saved when required. Additionally, new test cases were added to verify the block quantization variant with different activation orderings. I have no feedback to provide as there are no review comments to evaluate.

mergify · 2026-04-14T13:28:11Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

HDCharles · 2026-04-14T13:28:38Z


-    if not has_gidx:
-        g_idx = None
+            if actorder == ActivationOrdering.GROUP:


We can simplify this, only group act order saves g_idx, can just check for that
No need to check twice, just do this line on line 287 and remove g_idx_to_save

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py (1)
180-188: ⚠️ Potential issue | 🟡 Minor

The assertions don't pin the new BLOCK behavior yet.

Relaxing this to 4 or 8 means a deserialization fallback to any other 8-bit scheme would still pass. For the BLOCK recipes, assert weight_args.strategy and weight_args.block_structure too so this test actually verifies the path added by the PR.

As per coding guidelines, "tests/**/*.py: Ensure PyTest tests are clear, comprehensive, and cover edge cases for quantization scenarios."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py` around lines 180
- 188, The test currently only checks num_bits for weight_args allowing 4 or 8
which doesn't ensure the new BLOCK recipe path; update the assertions after
obtaining weight_args (from quantization_config.config_groups["group_0"]) to
also assert that weight_args.strategy equals the BLOCK strategy name used in
your codebase (e.g., "BLOCK") and that weight_args.block_structure matches the
expected structure (e.g., a tuple/list or specific object describing block dims)
so the test verifies both strategy and block_structure in addition to num_bits;
locate symbols quantization_config, quant_scheme, weight_args, QuantizationArgs
and add assertions for weight_args.strategy and weight_args.block_structure
accordingly.
src/llmcompressor/modifiers/gptq/gptq_quantize.py (1)
263-287: ⚠️ Potential issue | 🟠 Major

BLOCK strategy with activation ordering creates an untested, unsupported code path with cross-repository incompatibility.

When strategy == QuantizationStrategy.BLOCK and actorder == ActivationOrdering.GROUP, the code saves weight_g_idx (line 275). However:

Incompatibility with compressed-tensors initialize.py: g_idx is only registered as a parameter for GROUP/TENSOR_GROUP strategies; BLOCK never registers {base_name}_g_idx, so when loading, this saved g_idx cannot be restored to a module parameter.

Missing g_idx handling in calibration flatten: _flatten_weight() in src/llmcompressor/observers/helpers.py:75-80 only applies g_idx reordering for GROUP/TENSOR_GROUP strategies. For BLOCK, g_idx is ignored during flattening, even though it is passed from imatrix.py:165.

No test coverage: There are no tests combining BLOCK strategy with any ActivationOrdering, making this a completely untested path.

This indicates a design decision was incomplete: either BLOCK should not save g_idx when using activation ordering, or initialize.py and _flatten_weight should both be updated to support g_idx for BLOCK (with proper divisor-based grouping).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llmcompressor/modifiers/gptq/gptq_quantize.py` around lines 263 - 287,
The BLOCK strategy currently writes weight_g_idx when actorder ==
ActivationOrdering.GROUP (in gptq_quantize.py) which is incompatible because
compressed-tensors initialize.py does not register a {base_name}_g_idx for BLOCK
and _flatten_weight (helpers._flatten_weight) ignores g_idx for BLOCK; either
stop emitting weight_g_idx for QuantizationStrategy.BLOCK when using
ActivationOrdering.GROUP (remove the g_idx_to_save assignment and related
q_param_dict entry) or fully add BLOCK support: update initialize.py to register
{base_name}_g_idx for BLOCK with the correct divisor/grouping logic, extend
helpers._flatten_weight to apply g_idx reordering for BLOCK like
GROUP/TENSOR_GROUP, ensure imatrix.py passes g_idx consistently, and add tests
covering BLOCK combined with ActivationOrdering variations to validate
serialization and calibration paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py`:
- Around line 99-127: The two test recipes recipe_modifier_full_block and
recipe_modifier_block_actorder_weight currently construct QuantizationArgs
without specifying type, so they default to QuantizationType.INT; update both
QuantizationArgs in the GPTQModifier config_groups to include
type=QuantizationType.FLOAT so the tests exercise FP8 block quantization
behavior (locate the QuantizationArgs instances inside GPTQModifier for group_0
targeting "re:.*model.layers.2.self_attn.q_proj$" and add the type field).

---

Outside diff comments:
In `@src/llmcompressor/modifiers/gptq/gptq_quantize.py`:
- Around line 263-287: The BLOCK strategy currently writes weight_g_idx when
actorder == ActivationOrdering.GROUP (in gptq_quantize.py) which is incompatible
because compressed-tensors initialize.py does not register a {base_name}_g_idx
for BLOCK and _flatten_weight (helpers._flatten_weight) ignores g_idx for BLOCK;
either stop emitting weight_g_idx for QuantizationStrategy.BLOCK when using
ActivationOrdering.GROUP (remove the g_idx_to_save assignment and related
q_param_dict entry) or fully add BLOCK support: update initialize.py to register
{base_name}_g_idx for BLOCK with the correct divisor/grouping logic, extend
helpers._flatten_weight to apply g_idx reordering for BLOCK like
GROUP/TENSOR_GROUP, ensure imatrix.py passes g_idx consistently, and add tests
covering BLOCK combined with ActivationOrdering variations to validate
serialization and calibration paths.

In `@tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py`:
- Around line 180-188: The test currently only checks num_bits for weight_args
allowing 4 or 8 which doesn't ensure the new BLOCK recipe path; update the
assertions after obtaining weight_args (from
quantization_config.config_groups["group_0"]) to also assert that
weight_args.strategy equals the BLOCK strategy name used in your codebase (e.g.,
"BLOCK") and that weight_args.block_structure matches the expected structure
(e.g., a tuple/list or specific object describing block dims) so the test
verifies both strategy and block_structure in addition to num_bits; locate
symbols quantization_config, quant_scheme, weight_args, QuantizationArgs and add
assertions for weight_args.strategy and weight_args.block_structure accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cf4ada2-ab75-44d8-b52d-d5b43b6187e9

📥 Commits

Reviewing files that changed from the base of the PR and between 9d328ed and 89642b2.

📒 Files selected for processing (2)

src/llmcompressor/modifiers/gptq/gptq_quantize.py
tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py

HDCharles · 2026-04-14T13:45:13Z

See quality checks instructions and bot comments, otherwise looks pretty good just some minor fixes.

We should also run some evals for this, once you have the changes in I can help you with that. You can reach out to me in vLLM slack for easier coordination

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

mergify · 2026-04-22T21:56:48Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for:

#approved-reviews-by >= 2
#changes-requested-reviews-by = 0

This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

#approved-reviews-by >= 2
#changes-requested-reviews-by = 0

rk119 added 5 commits April 14, 2026 14:54

use g_idx for block

bd31a8e

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

update test

bcff7ba

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

Remove has_gidx

fe28610

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

Add test for actorder weight

d4135b9

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

fix bits

89642b2

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

This was referenced Apr 14, 2026

[CLOSED] Add actorder support for GPTQ block quantization #2610

Closed

Add weight activation ordering for fp8 block #2587

Open

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

HDCharles added the ready When a PR is ready for review label Apr 14, 2026

HDCharles requested review from brian-dellabetta and kylesayrs April 14, 2026 13:26

mergify Bot added the quality-failed label Apr 14, 2026

HDCharles reviewed Apr 14, 2026

View reviewed changes

rk119 marked this pull request as ready for review April 14, 2026 13:31

HDCharles reviewed Apr 14, 2026

View reviewed changes

Comment thread src/llmcompressor/modifiers/gptq/gptq_quantize.py Outdated

coderabbitai Bot added enhancement New feature or request gptq For any PR / issue related to GPTQ support fp8 For any issue / PR related to FP8 support Refactor Code cleanup and/or improvements to existing features labels Apr 14, 2026

HDCharles reviewed Apr 14, 2026

View reviewed changes

Comment thread src/llmcompressor/modifiers/gptq/gptq_quantize.py

HDCharles reviewed Apr 14, 2026

View reviewed changes

Comment thread src/llmcompressor/modifiers/gptq/gptq_quantize.py

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py Outdated

HDCharles requested changes Apr 14, 2026

View reviewed changes

rk119 added 3 commits April 14, 2026 18:08

Refactor

dad5c97

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

Fix comment

ee13f33

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

Add test

cfe55d9

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

Fix test

ea8229f

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

mergify Bot removed the quality-failed label Apr 14, 2026

Minor refactoring

293427d

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

HDCharles self-assigned this Apr 15, 2026

rk119 and others added 6 commits April 16, 2026 17:57

Merge branch 'vllm-project:main' into actorder-for-block

ae53389

Minor refactoring

2539e72

Signed-off-by: Riffat Khan <riffatk342@gmail.com>

Merge branch 'main' into actorder-for-block

52f5fd7

Merge branch 'main' into actorder-for-block

2499c14

Merge branch 'main' into actorder-for-block

30e2f9e

Merge branch 'main' into actorder-for-block

7682c0a

mergify Bot added the two-reviews When a PR requires two reviews label Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add actorder support for GPTQ block quantization#2616

Add actorder support for GPTQ block quantization#2616
rk119 wants to merge 16 commits intovllm-project:mainfrom
rk119:actorder-for-block

rk119 commented Apr 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mergify Bot commented Apr 14, 2026

Uh oh!

HDCharles Apr 14, 2026 •

edited

Loading

Uh oh!

rk119 Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

HDCharles commented Apr 14, 2026

Uh oh!

mergify Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rk119 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Apr 14, 2026

Uh oh!

HDCharles Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rk119 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HDCharles commented Apr 14, 2026

Uh oh!

mergify Bot commented Apr 22, 2026

Merge Protections

🔴 Require two reviews

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rk119 commented Apr 14, 2026 •

edited

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

HDCharles Apr 14, 2026 •

edited

Loading