Skip to content

Stabilize Qwen3 MoE LoRA mixed-scale test#1640

Open
taivu1998 wants to merge 1 commit into
NovaSky-AI:mainfrom
taivu1998:tdv/issue-1604-qwen3-lora-flake
Open

Stabilize Qwen3 MoE LoRA mixed-scale test#1640
taivu1998 wants to merge 1 commit into
NovaSky-AI:mainfrom
taivu1998:tdv/issue-1604-qwen3-lora-flake

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

  • Stabilizes the flaky Qwen3 MoE LoRA comparison reported in Flaky test_qwen3_moe_layer_lora, replace np.allclose? #1604 by making this test input deterministic.
  • Replaces the fixed absolute tolerance at the flaky assertion with a local mixed-scale assertion helper that preserves the existing relative tolerance while scaling absolute tolerance with the output dynamic range.
  • Adds explicit non-finite checks and a richer failure message so future failures report the adapter, sample, dynamic range, and effective tolerance.

Root Cause

Issue #1604 reports that tests/tx/models/test_qwen3.py::test_qwen3_moe_layer_lora compares tensors whose values span from small numbers to roughly 1e4. A fixed atol=1e-3 can make numerically equivalent JAX fused-LoRA and merged-weight paths fail on small-magnitude entries when the layer output has a much larger dynamic range.

Changes

  • Seed only this test input with a local torch.Generator so the flaky random input is reproducible without changing global RNG state.
  • Add assert_allclose_mixed_scale for this activation-level comparison.
  • Keep the assertion local to the Qwen3 MoE LoRA test and avoid production MoE/LoRA behavior changes.

Validation

  • git diff --check -- tests/tx/models/test_qwen3.py
  • uv run --with ruff ruff check tests/tx/models/test_qwen3.py
  • .venv/bin/python -m pytest -q tests/tx/models/test_qwen3.py::test_qwen3_moe_layer_lora - 3 passed
  • OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES .venv/bin/python -m pytest -q --forked tests/tx/models/test_qwen3.py::test_qwen3_moe_layer_lora - 3 passed

Fixes #1604

@taivu1998 taivu1998 marked this pull request as ready for review May 11, 2026 03:11
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new utility function, assert_allclose_mixed_scale, to the Qwen3 model tests for more precise numerical comparisons using a combination of base and scale-dependent absolute tolerances. Additionally, it updates test_qwen3_moe_layer_lora to use a seeded random generator for input data, ensuring test reproducibility. I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test_qwen3_moe_layer_lora, replace np.allclose?

1 participant