Skip to content

Add T5Gemma2 model to Hub#2619

Open
laxmareddyp wants to merge 11 commits intokeras-team:masterfrom
laxmareddyp:t5gemma2_model
Open

Add T5Gemma2 model to Hub#2619
laxmareddyp wants to merge 11 commits intokeras-team:masterfrom
laxmareddyp:t5gemma2_model

Conversation

@laxmareddyp
Copy link
Collaborator

@laxmareddyp laxmareddyp commented Mar 2, 2026

Description of the change

This model implementation is reference to the issue #2613

Foundation Model: T5Gemma 1 is based on the Gemma 2 framework, whereas T5Gemma 2 is built using the Gemma 3 architecture.
Tied Embeddings: Unlike the original T5Gemma, which uses separate word embeddings for the encoder and decoder, T5Gemma 2 ties all word embeddings (encoder input, decoder input, and decoder output) to reduce parameter count and memory footprint.
Merged Attention: T5Gemma 2 features a "merged attention" module that unifies the decoder's self-attention and cross-attention into a single joint module, whereas T5Gemma 1 maintains them as separate sub-layers.
Multimodality: T5Gemma 2 is natively multimodal and includes a frozen SigLIP vision encoder, allowing it to process images and text together. T5Gemma 1 is a text-only model.
Context Window: T5Gemma 2 supports a much larger context window of up to 128K tokens.

Model Numerics Verification screenshots:

Param count difference , This is expected and not a bug:

KerasHub counts encoder embedding + decoder embedding as separate weight matrices.

HF shares a single nn.Embedding instance for encoder/decoder/lm_head and counts it once.

Screenshot 2026-03-06 at 8 43 34 AM

Colab Notebook

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@github-actions github-actions bot added the Gemma Gemma model specific issues label Mar 2, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the T5Gemma2 model into the Keras Hub, providing a more advanced and versatile encoder-decoder architecture. The new model leverages the Gemma 3 framework, featuring innovations like merged attention and tied embeddings to enhance efficiency and performance. Its native multimodal capabilities and expanded context window significantly broaden its application scope compared to previous versions, enabling more complex and diverse tasks.

Highlights

  • Introduction of T5Gemma2 Model: The pull request introduces the T5Gemma2 model, a new sequence-to-sequence language model based on the Gemma 3 architecture, offering significant advancements over its predecessor, T5Gemma1.
  • Architectural Enhancements: Key architectural improvements include tied word embeddings across encoder input, decoder input, and decoder output, and a novel 'merged attention' module in the decoder that unifies self-attention and cross-attention into a single computation.
  • Multimodality and Context Window: T5Gemma2 is designed to be natively multimodal, incorporating a frozen SigLIP vision encoder for combined image and text processing, and supports a substantially larger context window of up to 128K tokens.
  • Comprehensive Implementation: The implementation includes dedicated components for the T5Gemma2 backbone, encoder and decoder layers, attention mechanisms, MLP, tokenizer, and a sequence-to-sequence language model, along with corresponding unit tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • keras_hub/api/models/init.py
    • Imported T5Gemma2Backbone, T5Gemma2Seq2SeqLM, T5Gemma2Seq2SeqLMPreprocessor, and T5Gemma2Tokenizer to the models API.
  • keras_hub/api/tokenizers/init.py
    • Imported T5Gemma2Tokenizer to the tokenizers API.
  • keras_hub/src/models/t5gemma2/init.py
    • Added initialization file for the T5Gemma2 model directory, including preset registration for T5Gemma2Backbone.
  • keras_hub/src/models/t5gemma2/t5gemma2_attention.py
    • Added T5Gemma2Attention class for self-attention with RoPE, Q/K normalization, and Grouped Query Attention (GQA).
    • Added T5Gemma2MergedAttention class, which fuses self-attention and cross-attention for the decoder by concatenating K/V pairs, a key architectural difference from T5Gemma1.
  • keras_hub/src/models/t5gemma2/t5gemma2_backbone.py
    • Added T5Gemma2Backbone class, implementing the encoder-decoder architecture with Gemma3-style Q/K normalization and per-layer-type sliding window attention patterns.
    • Included token embeddings, encoder layers, decoder layers, and RMS normalization for the backbone.
    • Defined the functional model structure for encoder and decoder inputs and outputs.
  • keras_hub/src/models/t5gemma2/t5gemma2_backbone_test.py
    • Added unit tests for T5Gemma2Backbone, covering basic functionality, asymmetrical configurations, model saving, and preset loading.
  • keras_hub/src/models/t5gemma2/t5gemma2_decoder.py
    • Added T5Gemma2DecoderLayer class, which utilizes T5Gemma2MergedAttention to combine self-attention and cross-attention.
    • Implemented causal and cross-attention mask creation for the decoder layer.
    • Integrated RMS normalization and MLP components within the decoder layer.
  • keras_hub/src/models/t5gemma2/t5gemma2_encoder.py
    • Added T5Gemma2EncoderLayer class, incorporating T5Gemma2Attention for bidirectional self-attention and a feed-forward network (MLP).
    • Implemented attention mask creation for the encoder layer.
    • Integrated RMS normalization and MLP components within the encoder layer.
  • keras_hub/src/models/t5gemma2/t5gemma2_layers.py
    • Added t5gemma2_kernel_initializer function for consistent weight initialization.
    • Added T5Gemma2MLP class, implementing the feed-forward network block with gated GELU activation and dropout.
  • keras_hub/src/models/t5gemma2/t5gemma2_presets.py
    • Added a placeholder file for T5Gemma2 backbone presets, to be populated with checkpoint information.
  • keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm.py
    • Added T5Gemma2Seq2SeqLM class, an end-to-end model for sequence-to-sequence language modeling.
    • Implemented call_encoder, call_decoder_with_cache, and _build_cache methods to support autoregressive generation.
    • Provided a generate_step method for compilable generation, including handling of stop tokens.
  • keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm_preprocessor.py
    • Added T5Gemma2Seq2SeqLMPreprocessor class, designed for use with T5Gemma2Seq2SeqLM to process raw strings into tokenized inputs.
    • Implemented call method for training data preprocessing and generate_preprocess for inference preprocessing, handling encoder and decoder sequences.
  • keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm_test.py
    • Added unit tests for T5Gemma2Seq2SeqLM, verifying basic functionality, encoder calls, cache building, and model saving.
  • keras_hub/src/models/t5gemma2/t5gemma2_tokenizer.py
    • Added T5Gemma2Tokenizer class, a SentencePiece-based tokenizer specifically configured for T5Gemma2 models, including special tokens like , , and .
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the T5Gemma2 model, a significant and well-structured contribution that aligns with the repository's modular design. The implementation correctly follows naming conventions and architectural patterns.

However, there are several critical issues to address:

  1. Missing Tests: A number of new modules (t5gemma2_tokenizer, t5gemma2_seq_2_seq_lm_preprocessor, t5gemma2_attention, t5gemma2_decoder, t5gemma2_encoder, t5gemma2_layers) are missing their corresponding test files (_test.py). This is a violation of the repository's testing requirements (Style Guide, line 406) and is critical for ensuring code quality and correctness.
  2. Unrunnable Examples: The docstring examples for T5Gemma2Tokenizer and T5Gemma2Seq2SeqLM use a preset that is not yet available, which will lead to errors for users. The examples should be updated to be runnable or clearly marked as placeholders.
  3. PR Description Discrepancy: The description mentions that T5Gemma2 is natively multimodal with a vision encoder, but the current implementation appears to be text-only. It would be helpful to clarify this in the description.

Please address the missing tests and unrunnable examples to finalize this contribution.

@sachinprasadhs sachinprasadhs added the new model For PRs that contribute a new model to the Keras Hub registry. label Mar 3, 2026
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Mar 6, 2026
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 6, 2026
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Mar 6, 2026
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 6, 2026
@laxmareddyp laxmareddyp marked this pull request as ready for review March 6, 2026 17:08
@divyashreepathihalli
Copy link
Collaborator

can you please address the gemini review comments and once does, please resolve the comments you have addressed

@laxmareddyp
Copy link
Collaborator Author

laxmareddyp commented Mar 6, 2026

can you please address the gemini review comments and once does, please resolve the comments you have addressed

Those are related to test files creating and are not required. Included necessary test files to cover the code. Thanks

@divyashreepathihalli
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the T5Gemma2 model, a significant architectural update from T5Gemma1, featuring merged attention and multimodal capabilities. The implementation is comprehensive, including the backbone, task model, preprocessor, tokenizer, and a detailed checkpoint conversion script with numerical verification. The code is well-structured and adheres to the repository's patterns.

However, the pull request is missing the required Colab notebooks for numerical validation, as specified in the repository's contribution guidelines (rule #516). Please add links to Colabs demonstrating numerical equivalence for the backbone, preprocessor, and the end-to-end task model against the original implementation.

I have also found one potential correctness issue in the decoder's sliding window attention implementation, which I've detailed in a specific comment.

Comment on lines +197 to +199
sliding_mask = (
q_indices[:, None] - self.sliding_window
) <= kv_indices[None, :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The implementation of the causal sliding window mask appears to be off by one. The current condition (q_indices[:, None] - self.sliding_window) <= kv_indices[None, :] creates a window of size self.sliding_window + 1, as it allows a token to attend to self.sliding_window previous tokens plus itself.

To ensure the window size is exactly self.sliding_window, the condition should be adjusted.

Suggested change
sliding_mask = (
q_indices[:, None] - self.sliding_window
) <= kv_indices[None, :]
sliding_mask = (
q_indices[:, None] - (self.sliding_window - 1)
) <= kv_indices[None, :]
References
  1. Your goal is to critically test the logic. Actively search for and point out failing edge cases, race conditions, or unhandled exceptions in the implementation.

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@please address this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Gemma Gemma model specific issues new model For PRs that contribute a new model to the Keras Hub registry.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants