Add Qwen2 VL by samudraneel05 · Pull Request #2604 · keras-team/keras-hub

samudraneel05 · 2026-02-18T23:07:09Z

Description of the change

Added Qwen 2 VL, with notebooks documenting output matching and numerics verification on the 2B parameter version. Open for review and feedback!

Reference

Fixes #2323
Hugging Face Link: Link

Colab Notebook

Overall check with numerics here
Tokenizer comparison with hf here
Preprocessor comparison here

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

gemini-code-assist · 2026-02-18T23:07:37Z

Summary of Changes

Hello @samudraneel05, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands Keras Hub's capabilities by integrating the Qwen2-VL multimodal model. It provides a complete framework for handling both visual and textual inputs, allowing for advanced vision-language tasks. The changes encompass the core model architecture, data preprocessing, and tools for converting existing models, making it easier for users to leverage this powerful new model.

Highlights

New Model Integration: Introduced the Qwen2-VL multimodal model, including its backbone, causal language model, preprocessor, image converter, and tokenizer, enabling vision-language capabilities within Keras Hub.
Vision-Language Processing: Implemented a comprehensive vision processing pipeline, featuring smart image resizing, normalization, patch embedding via a 3D Vision Encoder with Rotary Position Embeddings (RoPE), and a PatchMerger to integrate vision features into the text sequence.
Hugging Face Compatibility: Added a dedicated conversion script and updated the preset loader to facilitate seamless conversion of Qwen2-VL model weights and tokenizers from Hugging Face, ensuring interoperability.
Robust Testing and Verification: Included extensive unit tests for all new components, covering backbone functionality, vision input handling, image preprocessing, and tokenizer accuracy, along with a checkpoint conversion verification script.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

keras_hub/api/layers/init.py
- Imported Qwen2VLImageConverter to expose it in the API.
keras_hub/api/models/init.py
- Imported Qwen2VLBackbone, Qwen2VLCausalLM, Qwen2VLCausalLMPreprocessor, and Qwen2VLTokenizer to make them accessible via the models API.
keras_hub/api/tokenizers/init.py
- Imported Qwen2VLTokenizer to expose it in the tokenizers API.
keras_hub/src/models/qwen2_vl/init.py
- Initialized the Qwen2-VL model components and registered presets for the Qwen2VLBackbone.
keras_hub/src/models/qwen2_vl/qwen2_vl_backbone.py
- Implemented the Qwen2VLBackbone class, which combines a 3D Vision Encoder with a Qwen2 causal language model decoder and handles vision token replacement.
keras_hub/src/models/qwen2_vl/qwen2_vl_backbone_test.py
- Added unit tests for the Qwen2VLBackbone, covering basic functionality, vision input handling, and model saving.
keras_hub/src/models/qwen2_vl/qwen2_vl_causal_lm.py
- Implemented the Qwen2VLCausalLM for end-to-end causal vision-language modeling, including generate() and call_with_cache() methods for autoregressive inference.
keras_hub/src/models/qwen2_vl/qwen2_vl_causal_lm_preprocessor.py
- Implemented the Qwen2VLCausalLMPreprocessor to handle tokenization, image preprocessing, and assembly of inputs for the Qwen2-VL model.
keras_hub/src/models/qwen2_vl/qwen2_vl_causal_lm_preprocessor_test.py
- Added unit tests for the Qwen2VLCausalLMPreprocessor, covering text-only and image-inclusive preprocessing scenarios.
keras_hub/src/models/qwen2_vl/qwen2_vl_image_converter.py
- Implemented the Qwen2VLImageConverter for smart-resizing, normalizing, and converting images/video frames into flat patch tensors for the vision encoder.
keras_hub/src/models/qwen2_vl/qwen2_vl_image_converter_test.py
- Added unit tests for the Qwen2VLImageConverter, including smart-resize logic, aspect ratio handling, and image normalization.
keras_hub/src/models/qwen2_vl/qwen2_vl_presets.py
- Added a placeholder file for Qwen2-VL preset configurations.
keras_hub/src/models/qwen2_vl/qwen2_vl_tokenizer.py
- Implemented the Qwen2VLTokenizer based on QwenTokenizer, including logic to resolve vision-related special token IDs.
keras_hub/src/models/qwen2_vl/qwen2_vl_vision_encoder.py
- Implemented the Qwen2VLVisionEncoder, a 3D ViT with Rotary Position Embeddings (RoPE) and a PatchMerger for processing vision inputs.
keras_hub/src/models/qwen2_vl/qwen2_vl_vision_encoder_test.py
- Added unit tests for the Qwen2VLVisionEncoder, verifying patch merging, configuration roundtrip, and rotary embeddings.
keras_hub/src/utils/transformers/convert_qwen2_vl.py
- Added a new conversion script to port Hugging Face Qwen2-VL weights and tokenizer configurations to Keras Hub format.
keras_hub/src/utils/transformers/preset_loader.py
- Updated the PresetLoader to include the new convert_qwen2_vl module, enabling loading of Qwen2-VL presets.
tools/checkpoint_conversion/convert_qwen2_vl_checkpoints.py
- Added a script to convert and verify Hugging Face Qwen2-VL checkpoints to Keras Hub format, including tokenizer, preprocessor, and backbone output verification.

Activity

The pull request introduces the Qwen2-VL model, with the author, samudraneel05, providing detailed documentation and verification notebooks for output matching and numerics.
The author has confirmed that all necessary unit tests are added, existing code remains functional across all backends (TensorFlow, JAX, PyTorch), and Keras Hub's model and API design guidelines have been followed.
The inclusion of multiple Colab notebooks for numerics verification, tokenizer comparison, and preprocessor comparison demonstrates a thorough approach to validating the new model's implementation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the Qwen2-VL multimodal model, including its backbone, causal LM task, preprocessor, image converter, and tokenizer. The implementation is comprehensive and well-structured, with thorough testing and a detailed checkpoint conversion script. The code largely adheres to the repository's style guide, particularly in its backend-agnostic implementation and modular design. I have two main suggestions for improvement: one is to populate the presets file to enable from_preset() functionality and testing, and the other is to refactor some duplicated code for scattering vision embeddings to improve maintainability. Overall, this is a high-quality contribution.

keras_hub/src/models/qwen2_vl/qwen2_vl_presets.py

gemini-code-assist · 2026-02-18T23:09:23Z

keras_hub/src/models/qwen2_vl/qwen2_vl_causal_lm.py

+        # Scatter vision features into image placeholder positions.
+        if img_embeddings is not None:
+            image_mask = ops.equal(
+                token_ids,
+                ops.cast(self.backbone.image_token_id, token_ids.dtype),
+            )
+            batch_size = ops.shape(x)[0]
+            seq_len = ops.shape(x)[1]
+            x_flat = ops.reshape(x, (-1, self.backbone.hidden_dim))
+            mask_flat = ops.reshape(image_mask, (-1,))
+            vision_indices = ops.where(mask_flat)
+            if isinstance(vision_indices, (list, tuple)):
+                vision_indices = vision_indices[0]
+            vision_indices = ops.reshape(vision_indices, (-1, 1))
+            vision_indices = ops.cast(vision_indices, "int32")
+            x_flat = ops.scatter_update(x_flat, vision_indices, img_embeddings)
+            x = ops.reshape(
+                x_flat, (batch_size, seq_len, self.backbone.hidden_dim)
+            )


The logic for scattering vision features into the text embeddings is duplicated between this method (call_with_cache) and Qwen2VLBackbone.call. To improve maintainability and adhere to the principles of modularity and reusability, consider refactoring this logic into a helper method within the Qwen2VLBackbone class. This helper could take the text embeddings, token IDs, and vision features as input and return the updated text embeddings. Both Qwen2VLBackbone.call and Qwen2VLCausalLM.call_with_cache could then call this shared method.

References

The style guide emphasizes modularity and reusability. Refactoring duplicated code into a shared helper method aligns with these key principles. ^(link)

sachinprasadhs · 2026-02-19T05:16:58Z

My bad, I got confused with the github handle names in the comment I mentioned about qwen2-VL.
As per the original issue assignee, can you please close this PR and let him finish this PR #2599 as he created it before and he is the original assignee.
Since you already have one PR open for Omni model, you can focus on that completely.

Sorry again for the confusion and inconvenience.

samudraneel05 · 2026-02-19T07:27:55Z

i've reached out to the original issue assignee to see if we can do a best of both worlds model addition. i'll be closing this PR.

The omni model PR is and has been ready for review for a while!

samudraneel05 added 11 commits February 18, 2026 17:06

Adding Qwen 2 VL baseline stuff

f0a4a82

fixes and enhancements

1c6d1a7

minor optimizations

0615de4

checkpoint conversion and verification

3a202fb

changes to conversion method and uncommenting for testing

94c92cb

hf alignment

69fbc27

triage and isolate token issues

0636068

getting the build() call set

3d4e1f9

param loading and build issues

f22e0be

tokenizer final issues sorting

eb66ea9

bugfixes

baee159

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

brevity improvements

32c4ed6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2 VL#2604

Add Qwen2 VL#2604
samudraneel05 wants to merge 12 commits intokeras-team:masterfrom
samudraneel05:qwen2-vl

samudraneel05 commented Feb 18, 2026

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

sachinprasadhs commented Feb 19, 2026

Uh oh!

samudraneel05 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samudraneel05 commented Feb 18, 2026

Description of the change

Reference

Colab Notebook

Checklist

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

sachinprasadhs commented Feb 19, 2026

Uh oh!

samudraneel05 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants