[AWQ] [gemma3] remove input layernorm mapping by brian-dellabetta · Pull Request #2571 · vllm-project/llm-compressor

brian-dellabetta · 2026-04-06T20:01:35Z

SUMMARY:
Resolves #2522

Gemma3 applies an RMSNorm to the outputs of q/k proj layers (q_norm, k_norm) that tends to degrade performance over round-to-nearest when smoothed (see results here).

This PR drops that mapping from Gemma3 (it was initially inherited from Gemma2, which has no RMSNorms on output) with a comment to explain why.

TEST PLAN:

Confirm with [Bug]: Evaluate AWQ and GPTQ for Gemma #2522 issue poster

Summary by CodeRabbit

New Features
- Added support for Gemma3 model variants.
Bug Fixes
- Updated layer mapping configurations for optimized support of Gemma2 and Gemma3 models.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

github-actions · 2026-04-06T20:02:47Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

coderabbitai · 2026-04-06T20:03:58Z

📝 Walkthrough

Walkthrough

Updated AWQ mapping definitions to distinguish between Gemma2 and Gemma3 model variants. Renamed the existing Gemma mapping list to Gemma2, created a new Gemma3-specific mapping with adjusted layer configurations, and updated the model registry accordingly to use model-specific mappings.

Changes

Cohort / File(s)	Summary
AWQ Gemma Mapping Updates `src/llmcompressor/modifiers/awq/mappings.py`	Renamed `_gemma_mappings` to `_gemma2_mappings`; introduced `_gemma3_mappings` with modified layer smoothing strategy that removes `input_layernorm` balancing against Q/K/V projections, retains `v_proj → o_proj` mapping, and adds explicit `pre_feedforward_layernorm` and `up_proj` smoothing layers; updated model registry to assign Gemma2 and Gemma3 models to their respective mapping lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Gemma's quantized dreams now split in two,
Gemma2 and Gemma3, each with their own view,
With layers remapped for GQA grace,
Smoother scaling finds its rightful place,
Quality blooms where mappings align,
AWQ whispers, "Now you'll quantize just fine!" 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: removing the input layernorm mapping for Gemma3 AWQ quantization.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#2522` by removing the problematic input_layernorm → q/k/v_proj mapping that was causing AWQ performance degradation on Gemma3 models.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to addressing the AWQ Gemma3 mapping issue; no unrelated modifications were introduced.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bdellabe/bugfix-gemma3-awq

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/llmcompressor/modifiers/awq/mappings.py (1)
109-123: The Gemma3 mappings exclude all input_layernorm smoothing intentionally; if testing supports it, consider whether v_proj could be re-included.

The comment correctly identifies that Gemma3's q_proj and k_proj have corresponding RMSNorm applied to their outputs (q_norm, k_norm), which degrade performance when smoothed. The implementation conservatively excludes the entire input_layernorm → q/k/v mapping. However, since only q_norm and k_norm are mentioned as problematic, you may want to test whether including input_layernorm → v_proj smoothing could improve accuracy without the norm-related degradation. The v_proj → o_proj mapping is still present, so partial smoothing is feasible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llmcompressor/modifiers/awq/mappings.py` around lines 109 - 123, The
current _gemma3_mappings conservatively excludes the entire input_layernorm →
q/k/v mapping even though only q_norm and k_norm are known to cause
RMSNorm-related degradation; update the mapping list to allow testing of
input_layernorm → v_proj smoothing by reintroducing an AWQMapping that maps
"re:.*input_layernorm$" (or the existing regex used elsewhere) to
["re:.*v_proj$"] while keeping mappings for q/k excluded, i.e., modify
_gemma3_mappings (and/or AWQMapping entries referencing v_proj/o_proj) so v_proj
is allowed for smoothing without changing the q/k exclusions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/llmcompressor/modifiers/awq/mappings.py`:
- Around line 109-123: The current _gemma3_mappings conservatively excludes the
entire input_layernorm → q/k/v mapping even though only q_norm and k_norm are
known to cause RMSNorm-related degradation; update the mapping list to allow
testing of input_layernorm → v_proj smoothing by reintroducing an AWQMapping
that maps "re:.*input_layernorm$" (or the existing regex used elsewhere) to
["re:.*v_proj$"] while keeping mappings for q/k excluded, i.e., modify
_gemma3_mappings (and/or AWQMapping entries referencing v_proj/o_proj) so v_proj
is allowed for smoothing without changing the q/k exclusions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9e06d60-51fb-4835-8365-2415e364f2f1

📥 Commits

Reviewing files that changed from the base of the PR and between a65b5dd and d8d2220.

📒 Files selected for processing (1)

src/llmcompressor/modifiers/awq/mappings.py

gemini-code-assist

Code Review

This pull request updates the AWQ mappings in src/llmcompressor/modifiers/awq/mappings.py by renaming the Gemma mappings to _gemma2_mappings and introducing a specific _gemma3_mappings configuration to address performance degradation issues related to RMSNorm in Gemma3. The reviewer suggested clarifying the shared architectural features between Gemma2 and Gemma3 in the code comments to improve maintainability.

dsikka · 2026-04-06T23:55:47Z

@coderabbitai review latest changes

dsikka · 2026-04-07T15:51:13Z

@Mergifyio refresh

mergify · 2026-04-07T15:51:51Z

refresh

✅ Pull request refreshed

brian-dellabetta added 2 commits April 6, 2026 19:50

gemma3 awq remove input layernorm mapping

87eb4b1

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

cleanup

ef55c20

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

comment update

d8d2220

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta mentioned this pull request Apr 6, 2026

[Bug]: Evaluate AWQ and GPTQ for Gemma #2522

Open

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread src/llmcompressor/modifiers/awq/mappings.py

kylesayrs approved these changes Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWQ] [gemma3] remove input layernorm mapping#2571

[AWQ] [gemma3] remove input layernorm mapping#2571
brian-dellabetta wants to merge 3 commits intomainfrom
bdellabe/bugfix-gemma3-awq

brian-dellabetta commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

dsikka commented Apr 6, 2026

Uh oh!

dsikka commented Apr 7, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brian-dellabetta commented Apr 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

dsikka commented Apr 6, 2026

Uh oh!

dsikka commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Apr 7, 2026

✅ Pull request refreshed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brian-dellabetta commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

dsikka commented Apr 7, 2026 •

edited

Loading