Skip to content

Conversation

@eitanporat
Copy link
Collaborator

@eitanporat eitanporat commented Nov 2, 2025

Description

Qwen video encoder on static shapes B x C x T x H x W

Tests

Comparing against the torch implementation on random input

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great contribution Eitan! Left a few comments and we can sync later.

I see you have two PRs for video and audio encoders separately, but still I can see some audio related components in this PR. Which PR do you want to merge first?

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 06d4a81 to 7f69b41 Compare November 3, 2025 10:08
@eitanporat
Copy link
Collaborator Author

I built my qwen3 video branch on top of the audio branch. If you want I can rebuild my commits again. Both are passing my tests right now.

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 3 times, most recently from 3d812e8 to ec2d56f Compare November 3, 2025 17:08
Copy link
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments on the test

@hengtaoguo
Copy link
Collaborator

hengtaoguo commented Nov 4, 2025

Thanks Eitan! Please run pip install pre-commit to install precommit hooks to guard the code style. Also you may need to squash all your commits into one for merging.

Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running bash code_style.sh will introduce a bunch of other changes.

@SamuelMarks
Copy link
Collaborator

@hengtaoguo Yeah we're going to delete code_style.sh soon, in favour of just running:

pre-commit run --all-files

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch from 2daef26 to a33f86d Compare November 4, 2025 09:50
@eitanporat
Copy link
Collaborator Author

@hengtaoguo Yeah we're going to delete code_style.sh soon, in favour of just running:

pre-commit run --all-files

done

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from f34fb59 to c79b379 Compare November 4, 2025 11:06
@eitanporat
Copy link
Collaborator Author

Thanks Eitan! Please run pip install pre-commit to install precommit hooks to guard the code style. Also you may need to squash all your commits into one for merging.

Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running bash code_style.sh will introduce a bunch of other changes.

Some pre-existing files are not passing the pre-commit.

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 6 times, most recently from 13afa52 to 40411d0 Compare November 4, 2025 19:08
Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! Some final remarks before merging:

  • Run your check_qwen3 unit tests and let us know if they pass.
  • Resolve all pylint/pyink issues.
  • Sync to head, resolve all conflicts and squash into one commit.
  • Also, you may need to include the checklist in your PR description. It is a default template when you initialize a PR in MaxText.

Copy link
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. And since check_qwen3_vision_encoder.py is run locally. Could you share the output?

@eitanporat
Copy link
Collaborator Author

Output of the tests

test_attention_is_jittable (main.TestQwen3OmniMoeVisionAttention.test_attention_is_jittable)
Test that attention is JIT-compilable. ... ok
test_attention_output_matches_torch (main.TestQwen3OmniMoeVisionAttention.test_attention_output_matches_torch)
Test that JAX vision attention output matches PyTorch implementation. ... ok
test_vision_encoder_single_image (main.TestQwen3OmniMoeVisionEncoderEndToEnd.test_vision_encoder_single_image)
Test full vision encoder with single image matches PyTorch. ... ok
test_mlp_is_jittable (main.TestQwen3OmniMoeVisionMLP.test_mlp_is_jittable)
Test that MLP is JIT-compilable. ... ok
test_mlp_output_matches_torch (main.TestQwen3OmniMoeVisionMLP.test_mlp_output_matches_torch)
Test that JAX MLP output matches PyTorch implementation. ... ok
test_patch_embed_is_jittable (main.TestQwen3OmniMoeVisionPatchEmbed.test_patch_embed_is_jittable)
Test that patch embed is JIT-compilable. ... ok
test_patch_embed_output_matches_torch (main.TestQwen3OmniMoeVisionPatchEmbed.test_patch_embed_output_matches_torch)
Test that JAX patch embed output matches PyTorch implementation. ... ok
test_patch_merger_is_jittable (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_is_jittable)
Test that patch merger is JIT-compilable. ... ok
test_patch_merger_output_matches_torch_with_postshuffle (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_output_matches_torch_with_postshuffle)
Test patch merger with postshuffle_norm matches PyTorch. ... ok
test_patch_merger_output_matches_torch_without_postshuffle (main.TestQwen3OmniMoeVisionPatchMerger.test_patch_merger_output_matches_torch_without_postshuffle)
Test patch merger without postshuffle_norm matches PyTorch. ... ok
test_pos_embed_interpolate_matches_torch (main.TestQwen3OmniMoeVisionPosEmbedInterpolate.test_pos_embed_interpolate_matches_torch)
Test that JAX position embedding interpolation matches PyTorch encoder. ... ok
test_pos_embed_interpolate_multiple_images (main.TestQwen3OmniMoeVisionPosEmbedInterpolate.test_pos_embed_interpolate_multiple_images)
Test position embedding interpolation with multiple images/videos. ... ok
test_grid_based_embedding_matches_torch (main.TestQwen3OmniMoeVisionRotaryEmbedding.test_grid_based_embedding_matches_torch)
Test that JAX grid-based rotary embedding matches PyTorch implementation. ... ok
test_rotation_application_matches_torch (main.TestQwen3OmniMoeVisionRotaryEmbedding.test_rotation_application_matches_torch)
Test that applying rotary embedding to Q/K tensors matches PyTorch. ... ok


Ran 14 tests in 59.427s

OK

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch from 40411d0 to 9ccc1f6 Compare November 5, 2025 11:28
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 2 times, most recently from 2515f9c to b883fe8 Compare November 5, 2025 16:36
@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch 3 times, most recently from 3f71239 to 5a56d2d Compare November 5, 2025 21:05
hs = config.hidden_size_for_vit
self.spatial_merge_size = config.spatial_merge_size_for_vit

self.pos_embed_interpolate = Qwen3OmniMoeVisionPosEmbedInterpolate(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint error: Undefined variable 'Qwen3OmniMoeVisionPosEmbedInterpolate' (undefined-variable)

You should import from embeddings.Qwen3OmniMoeVisionPosEmbedInterpolate

@eitanporat eitanporat force-pushed the qwen3omni_video_encoder branch from 5a56d2d to 9f26995 Compare November 6, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants