-
Notifications
You must be signed in to change notification settings - Fork 419
Qwen3omni video encoder #2582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qwen3omni video encoder #2582
Conversation
b3ef961 to
2d621cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great contribution Eitan! Left a few comments and we can sync later.
I see you have two PRs for video and audio encoders separately, but still I can see some audio related components in this PR. Which PR do you want to merge first?
06d4a81 to
7f69b41
Compare
|
I built my qwen3 video branch on top of the audio branch. If you want I can rebuild my commits again. Both are passing my tests right now. |
3d812e8 to
ec2d56f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments on the test
|
Thanks Eitan! Please run Hi @SamuelMarks , do we have any handy ways to auto-correct the pylint issue? Running |
|
@hengtaoguo Yeah we're going to delete pre-commit run --all-files |
2daef26 to
a33f86d
Compare
done |
f34fb59 to
c79b379
Compare
Some pre-existing files are not passing the pre-commit. |
13afa52 to
40411d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work! Some final remarks before merging:
- Run your check_qwen3 unit tests and let us know if they pass.
- Resolve all pylint/pyink issues.
- Sync to head, resolve all conflicts and squash into one commit.
- Also, you may need to include the checklist in your PR description. It is a default template when you initialize a PR in MaxText.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments. And since check_qwen3_vision_encoder.py is run locally. Could you share the output?
|
Output of the tests test_attention_is_jittable (main.TestQwen3OmniMoeVisionAttention.test_attention_is_jittable) Ran 14 tests in 59.427s OK |
40411d0 to
9ccc1f6
Compare
2515f9c to
b883fe8
Compare
3f71239 to
5a56d2d
Compare
| hs = config.hidden_size_for_vit | ||
| self.spatial_merge_size = config.spatial_merge_size_for_vit | ||
|
|
||
| self.pos_embed_interpolate = Qwen3OmniMoeVisionPosEmbedInterpolate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint error: Undefined variable 'Qwen3OmniMoeVisionPosEmbedInterpolate' (undefined-variable)
You should import from embeddings.Qwen3OmniMoeVisionPosEmbedInterpolate
5a56d2d to
9f26995
Compare
Description
Qwen video encoder on static shapes B x C x T x H x W
Tests
Comparing against the torch implementation on random input
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.