Fix bug for qk norm in attentions.py #2604

Rohan-Bierneni · 2025-11-05T18:24:15Z

Description

Bug in attentions.py where query_norm and key_norm were initialized to None first and then assigned to nnx modules. However, nnx was treating query_norm and key_norm as static variables, throwing an error. This pr fixes that issue.

Also, added an AOT test that tests this code block to prevent such issues from happening again:

maxtext/src/MaxText/layers/attentions.py

Line 488 in cb136bc

if self.use_qk_norm and not is_llama4_decoder_block:

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: #2602

Tests

Was able to get a successful run with the changes from this pr: https://paste.googleplex.com/6601423567585280

The AOT test also passes locally on cpu vm: https://paste.googleplex.com/5073430399549440

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

parambole

LGTM

shuningjin

Do you know why this was not caught in the existing tests, e.g., the one related to qwen3next? (non-urgent, can discuss later)

src/MaxText/layers/attentions.py

eitanporat · 2025-11-05T21:17:51Z

Do you know why this was not caught in the existing tests, e.g., the one related to qwen3next? (non-urgent, can discuss later)

I think it happens in use_qk_norm branch of the code and i don't see it here in the config src/MaxText/configs/models/qwen3-next-80b-a3b.yml

SurbhiJainUSC · 2025-11-05T21:47:22Z

Do you know why this was not caught in the existing tests, e.g., the one related to qwen3next? (non-urgent, can discuss later)

This issue was also caught by Airflow tests: https://b.corp.google.com/issues/447464486#comment10

bvandermoon

What error were you seeing without this? Mind adding it to the PR description?

redo qk norm initialization remove testing code change Add aot test for non-llama qk norm

hengtaoguo · 2025-11-05T23:50:42Z

Feel free to add these information to your PR description. I will close my identical fix and let's merge yours.

This is breaking our Gemma3/Qwen3 XLML tests and decoding utils for these two model families.

Fixes: b/458142671

Rohan-Bierneni requested review from NicoGrande, RissyRan, bvandermoon, gagika, gobbleturk, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners November 5, 2025 18:24

Rohan-Bierneni force-pushed the rbierneni-qwen3-next-fullattention branch from 7a03c53 to 4a32d8b Compare November 5, 2025 18:26

parambole approved these changes Nov 5, 2025

View reviewed changes

shuningjin reviewed Nov 5, 2025

View reviewed changes

src/MaxText/layers/attentions.py Outdated Show resolved Hide resolved

bvandermoon reviewed Nov 5, 2025

View reviewed changes

parambole mentioned this pull request Nov 5, 2025

Fix Attention static-attribute error #2607

Closed

4 tasks

Rohan-Bierneni requested review from A9isha, NuojCheng, SurbhiJainUSC, aireenmei, hengtaoguo, khatwanimohit and vipannalla as code owners November 5, 2025 23:48

initialize qk norm to nnx modules

9d13e0d

redo qk norm initialization remove testing code change Add aot test for non-llama qk norm

Rohan-Bierneni force-pushed the rbierneni-qwen3-next-fullattention branch from 042c583 to 9d13e0d Compare November 5, 2025 23:48

hengtaoguo approved these changes Nov 5, 2025

View reviewed changes

bvandermoon approved these changes Nov 6, 2025

View reviewed changes

Rohan-Bierneni self-assigned this Nov 6, 2025

Rohan-Bierneni added the pull ready label Nov 6, 2025

copybara-service bot merged commit b8fb668 into main Nov 6, 2025
48 of 52 checks passed

copybara-service bot deleted the rbierneni-qwen3-next-fullattention branch November 6, 2025 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bug for qk norm in attentions.py #2604

Fix bug for qk norm in attentions.py #2604

Rohan-Bierneni commented Nov 5, 2025 •

edited

Loading

Uh oh!

parambole left a comment

Uh oh!

shuningjin left a comment •

edited

Loading

Uh oh!

Uh oh!

eitanporat commented Nov 5, 2025

Uh oh!

SurbhiJainUSC commented Nov 5, 2025

Uh oh!

bvandermoon left a comment

Uh oh!

hengtaoguo commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Fix bug for qk norm in attentions.py #2604

Fix bug for qk norm in attentions.py #2604

Conversation

Rohan-Bierneni commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

parambole left a comment

Choose a reason for hiding this comment

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eitanporat commented Nov 5, 2025

Uh oh!

SurbhiJainUSC commented Nov 5, 2025

Uh oh!

bvandermoon left a comment

Choose a reason for hiding this comment

Uh oh!

hengtaoguo commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Rohan-Bierneni commented Nov 5, 2025 •

edited

Loading

shuningjin left a comment •

edited

Loading