Qwen3omni video encoder #2582

eitanporat · 2025-11-02T19:53:28Z

Description

Qwen video encoder on static shapes B x C x T x H x W

Tests

Comparing against the torch implementation on random input

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

hengtaoguo

Thank you for the great contribution Eitan! Left a few comments and we can sync later.

I see you have two PRs for video and audio encoders separately, but still I can see some audio related components in this PR. Which PR do you want to merge first?

src/MaxText/configs/models/qwen3-omni-30b-a3b.yml

src/MaxText/configs/base.yml

src/MaxText/layers/embeddings.py

src/MaxText/layers/attentions.py

src/MaxText/layers/embeddings.py

src/MaxText/layers/qwen3.py

tests/check_qwen_vision_encoder.py

eitanporat · 2025-11-03T10:34:37Z

I built my qwen3 video branch on top of the audio branch. If you want I can rebuild my commits again. Both are passing my tests right now.

aireenmei

A couple comments on the test

tests/check_qwen_vision_encoder.py

hengtaoguo · 2025-11-04T00:29:38Z

Thanks Eitan! Please run pip install pre-commit to install precommit hooks to guard the code style. Also you may need to squash all your commits into one for merging.

Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running bash code_style.sh will introduce a bunch of other changes.

SamuelMarks · 2025-11-04T02:16:26Z

@hengtaoguo Yeah we're going to delete code_style.sh soon, in favour of just running:

pre-commit run --all-files

tests/multimodal_test_utils.py

eitanporat · 2025-11-04T09:51:16Z

@hengtaoguo Yeah we're going to delete code_style.sh soon, in favour of just running:
pre-commit run --all-files

done

eitanporat · 2025-11-04T11:07:13Z

Thanks Eitan! Please run pip install pre-commit to install precommit hooks to guard the code style. Also you may need to squash all your commits into one for merging.

Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running bash code_style.sh will introduce a bunch of other changes.

Some pre-existing files are not passing the pre-commit.

hengtaoguo

Thanks for the great work! Some final remarks before merging:

Run your check_qwen3 unit tests and let us know if they pass.
Resolve all pylint/pyink issues.
Sync to head, resolve all conflicts and squash into one commit.
Also, you may need to include the checklist in your PR description. It is a default template when you initialize a PR in MaxText.

src/MaxText/layers/attentions.py

tests/check_qwen_vision_encoder.py

src/MaxText/layers/decoders.py

aireenmei

Just some minor comments. And since check_qwen3_vision_encoder.py is run locally. Could you share the output?

src/MaxText/layers/qwen3.py

eitanporat · 2025-11-05T11:27:01Z

Output of the tests

test_attention_is_jittable (main.TestQwen3OmniMoeVisionAttention.test_attention_is_jittable)
Test that attention is JIT-compilable. ... ok
test_attention_output_matches_torch (main.TestQwen3OmniMoeVisionAttention.test_attention_output_matches_torch)
Test that JAX vision attention output matches PyTorch implementation. ... ok
test_vision_encoder_single_image (main.TestQwen3OmniMoeVisionEncoderEndToEnd.test_vision_encoder_single_image)
Test full vision encoder with single image matches PyTorch. ... ok
test_mlp_is_jittable (main.TestQwen3OmniMoeVisionMLP.test_mlp_is_jittable)
Test that MLP is JIT-compilable. ... ok
test_mlp_output_matches_torch (main.TestQwen3OmniMoeVisionMLP.test_mlp_output_matches_torch)
Test that JAX MLP output matches PyTorch implementation. ... ok
test_patch_embed_is_jittable (main.TestQwen3OmniMoeVisionPatchEmbed.test_patch_embed_is_jittable)
Test that patch embed is JIT-compilable. ... ok
test_patch_embed_output_matches_torch (main.TestQwen3OmniMoeVisionPatchEmbed.test_patch_embed_output_matches_torch)
Test that JAX patch embed output matches PyTorch implementation. ... ok
test_patch_merger_is_jittable (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_is_jittable)
Test that patch merger is JIT-compilable. ... ok
test_patch_merger_output_matches_torch_with_postshuffle (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_output_matches_torch_with_postshuffle)
Test patch merger with postshuffle_norm matches PyTorch. ... ok
test_patch_merger_output_matches_torch_without_postshuffle (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_output_matches_torch_without_postshuffle)
Test patch merger without postshuffle_norm matches PyTorch. ... ok
test_pos_embed_interpolate_matches_torch (main.TestQwen3OmniMoeVisionPosEmbedInterpolate.test_pos_embed_interpolate_matches_torch)
Test that JAX position embedding interpolation matches PyTorch encoder. ... ok
test_pos_embed_interpolate_multiple_images (main.TestQwen3OmniMoeVisionPosEmbedInterpolate.test_pos_embed_interpolate_multiple_images)
Test position embedding interpolation with multiple images/videos. ... ok
test_grid_based_embedding_matches_torch (main.TestQwen3OmniMoeVisionRotaryEmbedding.test_grid_based_embedding_matches_torch)
Test that JAX grid-based rotary embedding matches PyTorch implementation. ... ok
test_rotation_application_matches_torch (main.TestQwen3OmniMoeVisionRotaryEmbedding.test_rotation_application_matches_torch)
Test that applying rotary embedding to Q/K tensors matches PyTorch. ... ok

Ran 14 tests in 59.427s

OK

src/MaxText/layers/decoders.py

hengtaoguo · 2025-11-05T21:46:46Z

src/MaxText/layers/qwen3.py

+    hs = config.hidden_size_for_vit
+    self.spatial_merge_size = config.spatial_merge_size_for_vit
+
+    self.pos_embed_interpolate = Qwen3OmniMoeVisionPosEmbedInterpolate(


pylint error: Undefined variable 'Qwen3OmniMoeVisionPosEmbedInterpolate' (undefined-variable)

You should import from embeddings.Qwen3OmniMoeVisionPosEmbedInterpolate

eitanporat requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners November 2, 2025 19:53

eitanporat force-pushed the qwen3omni_video_encoder branch from b3ef961 to 2d621cd Compare November 2, 2025 19:57

hengtaoguo added the gemini-review label Nov 3, 2025

hengtaoguo requested changes Nov 3, 2025

View reviewed changes

eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 06d4a81 to 7f69b41 Compare November 3, 2025 10:08

eitanporat force-pushed the qwen3omni_video_encoder branch 3 times, most recently from 3d812e8 to ec2d56f Compare November 3, 2025 17:08

aireenmei reviewed Nov 3, 2025

View reviewed changes

tests/check_qwen_vision_encoder.py Show resolved Hide resolved

tests/check_qwen_vision_encoder.py Show resolved Hide resolved

SamuelMarks reviewed Nov 4, 2025

View reviewed changes

tests/multimodal_test_utils.py Outdated Show resolved Hide resolved

eitanporat force-pushed the qwen3omni_video_encoder branch from 2daef26 to a33f86d Compare November 4, 2025 09:50

eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from f34fb59 to c79b379 Compare November 4, 2025 11:06

eitanporat force-pushed the qwen3omni_video_encoder branch 6 times, most recently from 13afa52 to 40411d0 Compare November 4, 2025 19:08

hengtaoguo approved these changes Nov 4, 2025

View reviewed changes

src/MaxText/layers/attentions.py Show resolved Hide resolved

tests/check_qwen_vision_encoder.py Show resolved Hide resolved

src/MaxText/layers/decoders.py Outdated Show resolved Hide resolved

hengtaoguo self-assigned this Nov 4, 2025

hengtaoguo added gemini-review and removed gemini-review labels Nov 4, 2025

aireenmei approved these changes Nov 4, 2025

View reviewed changes

src/MaxText/layers/qwen3.py Show resolved Hide resolved

src/MaxText/layers/qwen3.py Show resolved Hide resolved

src/MaxText/layers/qwen3.py Outdated Show resolved Hide resolved

eitanporat force-pushed the qwen3omni_video_encoder branch from 40411d0 to 9ccc1f6 Compare November 5, 2025 11:28

eitanporat requested a review from NicoGrande as a code owner November 5, 2025 11:28

eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 2515f9c to b883fe8 Compare November 5, 2025 16:36

eitanporat commented Nov 5, 2025

View reviewed changes

src/MaxText/layers/decoders.py Show resolved Hide resolved

eitanporat force-pushed the qwen3omni_video_encoder branch 3 times, most recently from 3f71239 to 5a56d2d Compare November 5, 2025 21:05

hengtaoguo reviewed Nov 5, 2025

View reviewed changes

Add Qwen3 Omni Vision Encoder

9f26995

eitanporat force-pushed the qwen3omni_video_encoder branch from 5a56d2d to 9f26995 Compare November 6, 2025 13:38

Qwen3omni video encoder #2582

Are you sure you want to change the base?

Qwen3omni video encoder #2582

Uh oh!

Conversation

eitanporat commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

hengtaoguo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eitanporat commented Nov 3, 2025

Uh oh!

aireenmei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hengtaoguo commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamuelMarks commented Nov 4, 2025

Uh oh!

Uh oh!

eitanporat commented Nov 4, 2025

Uh oh!

eitanporat commented Nov 4, 2025

Uh oh!

hengtaoguo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aireenmei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eitanporat commented Nov 5, 2025

Uh oh!

Uh oh!

hengtaoguo Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eitanporat commented Nov 2, 2025 •

edited

Loading

hengtaoguo commented Nov 4, 2025 •

edited

Loading

hengtaoguo left a comment •

edited

Loading