[AWQ] [gemma3] remove input layernorm mapping#2571
[AWQ] [gemma3] remove input layernorm mapping#2571brian-dellabetta wants to merge 3 commits intomainfrom
Conversation
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
📝 WalkthroughWalkthroughUpdated AWQ mapping definitions to distinguish between Gemma2 and Gemma3 model variants. Renamed the existing Gemma mapping list to Gemma2, created a new Gemma3-specific mapping with adjusted layer configurations, and updated the model registry accordingly to use model-specific mappings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/llmcompressor/modifiers/awq/mappings.py (1)
109-123: The Gemma3 mappings exclude all input_layernorm smoothing intentionally; if testing supports it, consider whetherv_projcould be re-included.The comment correctly identifies that Gemma3's q_proj and k_proj have corresponding RMSNorm applied to their outputs (q_norm, k_norm), which degrade performance when smoothed. The implementation conservatively excludes the entire
input_layernorm → q/k/vmapping. However, since only q_norm and k_norm are mentioned as problematic, you may want to test whether includinginput_layernorm → v_projsmoothing could improve accuracy without the norm-related degradation. Thev_proj → o_projmapping is still present, so partial smoothing is feasible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llmcompressor/modifiers/awq/mappings.py` around lines 109 - 123, The current _gemma3_mappings conservatively excludes the entire input_layernorm → q/k/v mapping even though only q_norm and k_norm are known to cause RMSNorm-related degradation; update the mapping list to allow testing of input_layernorm → v_proj smoothing by reintroducing an AWQMapping that maps "re:.*input_layernorm$" (or the existing regex used elsewhere) to ["re:.*v_proj$"] while keeping mappings for q/k excluded, i.e., modify _gemma3_mappings (and/or AWQMapping entries referencing v_proj/o_proj) so v_proj is allowed for smoothing without changing the q/k exclusions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/llmcompressor/modifiers/awq/mappings.py`:
- Around line 109-123: The current _gemma3_mappings conservatively excludes the
entire input_layernorm → q/k/v mapping even though only q_norm and k_norm are
known to cause RMSNorm-related degradation; update the mapping list to allow
testing of input_layernorm → v_proj smoothing by reintroducing an AWQMapping
that maps "re:.*input_layernorm$" (or the existing regex used elsewhere) to
["re:.*v_proj$"] while keeping mappings for q/k excluded, i.e., modify
_gemma3_mappings (and/or AWQMapping entries referencing v_proj/o_proj) so v_proj
is allowed for smoothing without changing the q/k exclusions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a9e06d60-51fb-4835-8365-2415e364f2f1
📒 Files selected for processing (1)
src/llmcompressor/modifiers/awq/mappings.py
There was a problem hiding this comment.
Code Review
This pull request updates the AWQ mappings in src/llmcompressor/modifiers/awq/mappings.py by renaming the Gemma mappings to _gemma2_mappings and introducing a specific _gemma3_mappings configuration to address performance degradation issues related to RMSNorm in Gemma3. The reviewer suggested clarifying the shared architectural features between Gemma2 and Gemma3 in the code comments to improve maintainability.
|
@coderabbitai review latest changes |
|
@Mergifyio refresh |
✅ Pull request refreshed |
SUMMARY:
Resolves #2522
Gemma3 applies an RMSNorm to the outputs of q/k proj layers (q_norm, k_norm) that tends to degrade performance over round-to-nearest when smoothed (see results here).
This PR drops that mapping from Gemma3 (it was initially inherited from Gemma2, which has no RMSNorms on output) with a comment to explain why.
TEST PLAN:
Summary by CodeRabbit
New Features
Bug Fixes