Skip to content

[model] fix: adapt MCore dev attention gates#4326

Open
yaoyu-33 wants to merge 2 commits into
mainfrom
yuya/mcore-dev-autofix-20260612-pr4324
Open

[model] fix: adapt MCore dev attention gates#4326
yaoyu-33 wants to merge 2 commits into
mainfrom
yuya/mcore-dev-autofix-20260612-pr4324

Conversation

@yaoyu-33

Copy link
Copy Markdown
Contributor

Summary

Fixes the MCore dev bump failure from #4324.

Target: dev

Classification: MCore broke Bridge

Root cause: MCore dev commit range for the 2026-06-12 automated bump adds native head-wise attention gate support. MCore now passes head_wise_gate= into attention QKV helpers and rejects configurations that enable both head_wise_attn_gate and attention_output_gate. Bridge's DiT attention override did not accept the new keyword, and Step-3.5 conversion still requested the older expanded attention_output_gate layout when native scalar gates were available.

Fix

  • Accept head_wise_gate in DiT self/cross attention QKV helpers while keeping unsupported true-gate behavior guarded.
  • Preserve compatibility with older MCore cross-attention signatures that do not accept head_wise_gate.
  • Use MCore's native head_wise_attn_gate layout for Step-3.5 when the active MCore TransformerConfig supports it, while preserving the older attention_output_gate fallback otherwise.
  • Add focused unit coverage for the MCore dev keyword path, legacy MCore signature path, and Step-3.5 provider guard.

Guards

Added:

  • DiT self-attention head_wise_gate=True guard with TODO to remove when DiT attention supports MCore head_wise_attn_gate rows.
  • DiT cross-attention legacy-signature guard with TODO to remove when Megatron-Core main exposes the head_wise_gate keyword.
  • Step-3.5 provider capability guard with TODO to remove when Megatron-Core main exposes head_wise_attn_gate and Bridge no longer needs the attention_output_gate fallback.

Removed: none.

Validation

CW interactive, containerized:

  • uv run --no-sync python -m pytest tests/unit_tests/diffusion/model/common/test_dit_attention.py tests/unit_tests/models/stepfun/test_step35_bridge.py -q -> 67 passed, 35 warnings
  • UV_NO_SYNC=1 uv run pre-commit run --all-files -> passed all hooks

Local focused static checks:

  • .venv/bin/python -m py_compile <edited files> -> passed
  • .venv/bin/ruff check <edited files> -> passed

dimapihtar and others added 2 commits June 12, 2026 06:53
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33

Copy link
Copy Markdown
Contributor Author

/ok to test 08e24e7

@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic bug Something isn't working full-test-suite needs-review PR is ready for code review and waiting on a reviewer labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic bug Something isn't working full-test-suite needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants