Skip to content

[AWQ] [gemma3] remove input layernorm mapping#2571

Open
brian-dellabetta wants to merge 3 commits intomainfrom
bdellabe/bugfix-gemma3-awq
Open

[AWQ] [gemma3] remove input layernorm mapping#2571
brian-dellabetta wants to merge 3 commits intomainfrom
bdellabe/bugfix-gemma3-awq

Conversation

@brian-dellabetta
Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta commented Apr 6, 2026

SUMMARY:
Resolves #2522

Gemma3 applies an RMSNorm to the outputs of q/k proj layers (q_norm, k_norm) that tends to degrade performance over round-to-nearest when smoothed (see results here).

This PR drops that mapping from Gemma3 (it was initially inherited from Gemma2, which has no RMSNorms on output) with a comment to explain why.

TEST PLAN:

Summary by CodeRabbit

  • New Features

    • Added support for Gemma3 model variants.
  • Bug Fixes

    • Updated layer mapping configurations for optimized support of Gemma2 and Gemma3 models.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Updated AWQ mapping definitions to distinguish between Gemma2 and Gemma3 model variants. Renamed the existing Gemma mapping list to Gemma2, created a new Gemma3-specific mapping with adjusted layer configurations, and updated the model registry accordingly to use model-specific mappings.

Changes

Cohort / File(s) Summary
AWQ Gemma Mapping Updates
src/llmcompressor/modifiers/awq/mappings.py
Renamed _gemma_mappings to _gemma2_mappings; introduced _gemma3_mappings with modified layer smoothing strategy that removes input_layernorm balancing against Q/K/V projections, retains v_proj → o_proj mapping, and adds explicit pre_feedforward_layernorm and up_proj smoothing layers; updated model registry to assign Gemma2 and Gemma3 models to their respective mapping lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Gemma's quantized dreams now split in two,
Gemma2 and Gemma3, each with their own view,
With layers remapped for GQA grace,
Smoother scaling finds its rightful place,
Quality blooms where mappings align,
AWQ whispers, "Now you'll quantize just fine!" 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: removing the input layernorm mapping for Gemma3 AWQ quantization.
Linked Issues check ✅ Passed The PR successfully addresses issue #2522 by removing the problematic input_layernorm → q/k/v_proj mapping that was causing AWQ performance degradation on Gemma3 models.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing the AWQ Gemma3 mapping issue; no unrelated modifications were introduced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bdellabe/bugfix-gemma3-awq

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/llmcompressor/modifiers/awq/mappings.py (1)

109-123: The Gemma3 mappings exclude all input_layernorm smoothing intentionally; if testing supports it, consider whether v_proj could be re-included.

The comment correctly identifies that Gemma3's q_proj and k_proj have corresponding RMSNorm applied to their outputs (q_norm, k_norm), which degrade performance when smoothed. The implementation conservatively excludes the entire input_layernorm → q/k/v mapping. However, since only q_norm and k_norm are mentioned as problematic, you may want to test whether including input_layernorm → v_proj smoothing could improve accuracy without the norm-related degradation. The v_proj → o_proj mapping is still present, so partial smoothing is feasible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llmcompressor/modifiers/awq/mappings.py` around lines 109 - 123, The
current _gemma3_mappings conservatively excludes the entire input_layernorm →
q/k/v mapping even though only q_norm and k_norm are known to cause
RMSNorm-related degradation; update the mapping list to allow testing of
input_layernorm → v_proj smoothing by reintroducing an AWQMapping that maps
"re:.*input_layernorm$" (or the existing regex used elsewhere) to
["re:.*v_proj$"] while keeping mappings for q/k excluded, i.e., modify
_gemma3_mappings (and/or AWQMapping entries referencing v_proj/o_proj) so v_proj
is allowed for smoothing without changing the q/k exclusions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/llmcompressor/modifiers/awq/mappings.py`:
- Around line 109-123: The current _gemma3_mappings conservatively excludes the
entire input_layernorm → q/k/v mapping even though only q_norm and k_norm are
known to cause RMSNorm-related degradation; update the mapping list to allow
testing of input_layernorm → v_proj smoothing by reintroducing an AWQMapping
that maps "re:.*input_layernorm$" (or the existing regex used elsewhere) to
["re:.*v_proj$"] while keeping mappings for q/k excluded, i.e., modify
_gemma3_mappings (and/or AWQMapping entries referencing v_proj/o_proj) so v_proj
is allowed for smoothing without changing the q/k exclusions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9e06d60-51fb-4835-8365-2415e364f2f1

📥 Commits

Reviewing files that changed from the base of the PR and between a65b5dd and d8d2220.

📒 Files selected for processing (1)
  • src/llmcompressor/modifiers/awq/mappings.py

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the AWQ mappings in src/llmcompressor/modifiers/awq/mappings.py by renaming the Gemma mappings to _gemma2_mappings and introducing a specific _gemma3_mappings configuration to address performance degradation issues related to RMSNorm in Gemma3. The reviewer suggested clarifying the shared architectural features between Gemma2 and Gemma3 in the code comments to improve maintainability.

Comment thread src/llmcompressor/modifiers/awq/mappings.py
@dsikka
Copy link
Copy Markdown
Collaborator

dsikka commented Apr 6, 2026

@coderabbitai review latest changes

@dsikka
Copy link
Copy Markdown
Collaborator

dsikka commented Apr 7, 2026

@Mergifyio refresh

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 7, 2026

refresh

✅ Pull request refreshed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Evaluate AWQ and GPTQ for Gemma

3 participants