Fix Mistral4 tests by 3outeille · Pull Request #44827 · huggingface/transformers

3outeille · 2026-03-18T13:36:53Z

HuggingFaceDocBuilderDev · 2026-03-18T13:52:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…e/transformers into fix-aligned-data-ptr-grouped-mm

3outeille · 2026-03-18T18:19:13Z

src/transformers/integrations/moe.py

    else:
        # (S, input_dim) @ grouped (num_experts, output_dim, input_dim).T -> (S, output_dim)
-        out = _grouped_mm(input, weight.transpose(-2, -1), offs=offs)
+        out = _grouped_mm(input, weight.transpose(-2, -1).contiguous(), offs=offs)


.contiguous() after .transpose(-2, -1) in _grouped_linear is to ensure the weight tensor memory layout is contiguous before passing it to _grouped_mm, fixing RuntimeError: expected data_ptr to be aligned to 16 bytes. We could maybe forced Forced16BytesAlignment as in with MoE during the weight converter maybe ? (this way no other model will have this issue)

3outeille · 2026-03-18T18:21:37Z

src/transformers/models/mistral4/modeling_mistral4.py

+        partial_rotary_factor = config.rope_parameters.get("partial_rotary_factor", 1.0)
        dim = getattr(config, "head_dim", None) or config.hidden_size // config.num_attention_heads
-
+        dim = int(dim * partial_rotary_factor) # Mixtral4 doesn't apply ROPE to the full attention head


this was failing test_model_rope_scaling_frequencies with AssertionError: The values for attribute 'shape' do not match: torch.Size([1, 64]) != torch.Size([1, 128]). Mistral4 does not apply the rope to the full attention head cf (qk_rope and qk_nope)

3outeille · 2026-03-18T18:23:35Z

tests/models/mistral4/test_modeling_mistral4.py

@@ -44,12 +44,21 @@


 class Mistral4ModelTester(CausalLMModelTester):


test_model_is_small fails because the common inherited tester config made the tiny test model too large (1,233,664 params). Needs to be < 1000000

3outeille · 2026-03-18T18:25:04Z

src/transformers/models/mistral4/modular_mistral4.py

    _supports_flex_attn = True

-    _can_compile_fullgraph = True
+    _can_compile_fullgraph = False


TorchInductor error so 🙈

…e/transformers into fix-aligned-data-ptr-grouped-mm

3outeille · 2026-03-18T19:25:43Z

src/transformers/models/mistral4/modeling_mistral4.py

-        cache_position = torch.arange(query_states.shape[2], device=query_states.device) + past_seen_tokens
+        position_ids = kwargs.get("position_ids")
+        if position_ids is None:
+            position_ids = torch.arange(query_states.shape[2], device=query_states.device) + past_seen_tokens


we need to reuse the RoPE's position_ids otherwise we end up with different positions ( the fact that we recomputetorch.arange(seq_len)all the time) than RoPE's position_ids for the tokens which fucks up generation

github-actions · 2026-03-18T19:57:39Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, mistral4

3outeille · 2026-03-18T20:21:10Z

run-slow: auto, mistral4

github-actions · 2026-03-18T20:22:24Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/auto", "models/mistral4"]
quantizations: []

github-actions · 2026-03-18T20:44:03Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	b90678b2	workflow commit (merge commit)
PR	45f5bb7b	branch commit (from PR)
main	21950930	base commit (on `main`)

Model CI Report

❌ 2 new failed tests from this PR 😭

mistral4:
tests/models/mistral4/test_modeling_mistral4.py::Mistral4ModelTest::test_flex_attention_with_grads (✅ ⟹ ❌)
tests/models/mistral4/test_modeling_mistral4.py::Mistral4ModelTest::test_sdpa_can_dispatch_on_flash (✅ ⟹ ❌)

fix RuntimeError: expected data_ptr to be aligned to 16 bytes

dd0ec9a

3outeille changed the title ~~Fix RuntimeError: expected data_ptr to be aligned to 16 bytes~~ Fix RuntimeError 16 bytes alignment for Mistral Mar 18, 2026

3outeille changed the title ~~Fix RuntimeError 16 bytes alignment for Mistral~~ Fix RuntimeError 16 bytes alignment for Mistral4 Mar 18, 2026

Merge branch 'main' into fix-aligned-data-ptr-grouped-mm

faeec50

3outeille changed the title ~~Fix RuntimeError 16 bytes alignment for Mistral4~~ Fix Mistral4 tests Mar 18, 2026

3outeille and others added 8 commits March 18, 2026 14:44

Add Mistral4 causal LM auto mapping

cef1292

Adjust Mistral4 compile and RoPE behavior

ac6ac9f

Shrink Mistral4 common test config

7ddd76a

Merge branch 'fix-aligned-data-ptr-grouped-mm' of github.com:3outeill…

dcbaebb

…e/transformers into fix-aligned-data-ptr-grouped-mm

Merge branch 'main' into fix-aligned-data-ptr-grouped-mm

ce1ebf5

Merge branch 'fix-aligned-data-ptr-grouped-mm' of github.com:3outeill…

eed42de

…e/transformers into fix-aligned-data-ptr-grouped-mm

refactor a bit

4b503db

fix modular

99be081

3outeille commented Mar 18, 2026

View reviewed changes

3outeille and others added 3 commits March 18, 2026 19:36

Merge branch 'main' into fix-aligned-data-ptr-grouped-mm

2b8ab2b

linting

b56ca61

Merge branch 'fix-aligned-data-ptr-grouped-mm' of github.com:3outeill…

d85edf1

…e/transformers into fix-aligned-data-ptr-grouped-mm

3outeille requested a review from Cyrilvallez March 18, 2026 18:40

fix mistral4 gen

08be2e8

3outeille commented Mar 18, 2026

View reviewed changes

3outeille and others added 4 commits March 18, 2026 20:30

Merge branch 'main' into fix-aligned-data-ptr-grouped-mm

27ac158

linting

1f77a83

linting

14a7476

fix test shape

45f5bb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mistral4 tests#44827

Fix Mistral4 tests#44827
3outeille wants to merge 18 commits intohuggingface:mainfrom
3outeille:fix-aligned-data-ptr-grouped-mm

3outeille commented Mar 18, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2026

Uh oh!

3outeille Mar 18, 2026 •

edited

Loading

Uh oh!

3outeille Mar 18, 2026

Uh oh!

3outeille Mar 18, 2026

Uh oh!

3outeille Mar 18, 2026

Uh oh!

3outeille Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

3outeille commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -44,12 +44,21 @@


		class Mistral4ModelTester(CausalLMModelTester):

Conversation

3outeille commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2026

Uh oh!

3outeille Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

3outeille Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

3outeille commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3outeille commented Mar 18, 2026 •

edited

Loading

3outeille Mar 18, 2026 •

edited

Loading

3outeille Mar 18, 2026 •

edited

Loading