Add T5Gemma2 model to Hub by laxmareddyp · Pull Request #2619 · keras-team/keras-hub

laxmareddyp · 2026-03-02T22:51:30Z

Description of the change

This model implementation is reference to the issue #2613

Foundation Model: T5Gemma 1 is based on the Gemma 2 framework, whereas T5Gemma 2 is built using the Gemma 3 architecture.
Tied Embeddings: Unlike the original T5Gemma, which uses separate word embeddings for the encoder and decoder, T5Gemma 2 ties all word embeddings (encoder input, decoder input, and decoder output) to reduce parameter count and memory footprint.
Merged Attention: T5Gemma 2 features a "merged attention" module that unifies the decoder's self-attention and cross-attention into a single joint module, whereas T5Gemma 1 maintains them as separate sub-layers.
Multimodality: T5Gemma 2 is natively multimodal and includes a frozen SigLIP vision encoder, allowing it to process images and text together. T5Gemma 1 is a text-only model.
Context Window: T5Gemma 2 supports a much larger context window of up to 128K tokens.

Model Numerics Verification screenshots:

Param count difference , This is expected and not a bug:

KerasHub counts encoder embedding + decoder embedding as separate weight matrices.

HF shares a single nn.Embedding instance for encoder/decoder/lm_head and counts it once.

Colab Notebook

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

gemini-code-assist · 2026-03-02T22:51:56Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the T5Gemma2 model into the Keras Hub, providing a more advanced and versatile encoder-decoder architecture. The new model leverages the Gemma 3 framework, featuring innovations like merged attention and tied embeddings to enhance efficiency and performance. Its native multimodal capabilities and expanded context window significantly broaden its application scope compared to previous versions, enabling more complex and diverse tasks.

Highlights

Introduction of T5Gemma2 Model: The pull request introduces the T5Gemma2 model, a new sequence-to-sequence language model based on the Gemma 3 architecture, offering significant advancements over its predecessor, T5Gemma1.
Architectural Enhancements: Key architectural improvements include tied word embeddings across encoder input, decoder input, and decoder output, and a novel 'merged attention' module in the decoder that unifies self-attention and cross-attention into a single computation.
Multimodality and Context Window: T5Gemma2 is designed to be natively multimodal, incorporating a frozen SigLIP vision encoder for combined image and text processing, and supports a substantially larger context window of up to 128K tokens.
Comprehensive Implementation: The implementation includes dedicated components for the T5Gemma2 backbone, encoder and decoder layers, attention mechanisms, MLP, tokenizer, and a sequence-to-sequence language model, along with corresponding unit tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

keras_hub/api/models/init.py
- Imported T5Gemma2Backbone, T5Gemma2Seq2SeqLM, T5Gemma2Seq2SeqLMPreprocessor, and T5Gemma2Tokenizer to the models API.
keras_hub/api/tokenizers/init.py
- Imported T5Gemma2Tokenizer to the tokenizers API.
keras_hub/src/models/t5gemma2/init.py
- Added initialization file for the T5Gemma2 model directory, including preset registration for T5Gemma2Backbone.
keras_hub/src/models/t5gemma2/t5gemma2_attention.py
- Added T5Gemma2Attention class for self-attention with RoPE, Q/K normalization, and Grouped Query Attention (GQA).
- Added T5Gemma2MergedAttention class, which fuses self-attention and cross-attention for the decoder by concatenating K/V pairs, a key architectural difference from T5Gemma1.
keras_hub/src/models/t5gemma2/t5gemma2_backbone.py
- Added T5Gemma2Backbone class, implementing the encoder-decoder architecture with Gemma3-style Q/K normalization and per-layer-type sliding window attention patterns.
- Included token embeddings, encoder layers, decoder layers, and RMS normalization for the backbone.
- Defined the functional model structure for encoder and decoder inputs and outputs.
keras_hub/src/models/t5gemma2/t5gemma2_backbone_test.py
- Added unit tests for T5Gemma2Backbone, covering basic functionality, asymmetrical configurations, model saving, and preset loading.
keras_hub/src/models/t5gemma2/t5gemma2_decoder.py
- Added T5Gemma2DecoderLayer class, which utilizes T5Gemma2MergedAttention to combine self-attention and cross-attention.
- Implemented causal and cross-attention mask creation for the decoder layer.
- Integrated RMS normalization and MLP components within the decoder layer.
keras_hub/src/models/t5gemma2/t5gemma2_encoder.py
- Added T5Gemma2EncoderLayer class, incorporating T5Gemma2Attention for bidirectional self-attention and a feed-forward network (MLP).
- Implemented attention mask creation for the encoder layer.
- Integrated RMS normalization and MLP components within the encoder layer.
keras_hub/src/models/t5gemma2/t5gemma2_layers.py
- Added t5gemma2_kernel_initializer function for consistent weight initialization.
- Added T5Gemma2MLP class, implementing the feed-forward network block with gated GELU activation and dropout.
keras_hub/src/models/t5gemma2/t5gemma2_presets.py
- Added a placeholder file for T5Gemma2 backbone presets, to be populated with checkpoint information.
keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm.py
- Added T5Gemma2Seq2SeqLM class, an end-to-end model for sequence-to-sequence language modeling.
- Implemented call_encoder, call_decoder_with_cache, and _build_cache methods to support autoregressive generation.
- Provided a generate_step method for compilable generation, including handling of stop tokens.
keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm_preprocessor.py
- Added T5Gemma2Seq2SeqLMPreprocessor class, designed for use with T5Gemma2Seq2SeqLM to process raw strings into tokenized inputs.
- Implemented call method for training data preprocessing and generate_preprocess for inference preprocessing, handling encoder and decoder sequences.
keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm_test.py
- Added unit tests for T5Gemma2Seq2SeqLM, verifying basic functionality, encoder calls, cache building, and model saving.
keras_hub/src/models/t5gemma2/t5gemma2_tokenizer.py
- Added T5Gemma2Tokenizer class, a SentencePiece-based tokenizer specifically configured for T5Gemma2 models, including special tokens like , , and .

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the T5Gemma2 model, a significant and well-structured contribution that aligns with the repository's modular design. The implementation correctly follows naming conventions and architectural patterns.

However, there are several critical issues to address:

Missing Tests: A number of new modules (t5gemma2_tokenizer, t5gemma2_seq_2_seq_lm_preprocessor, t5gemma2_attention, t5gemma2_decoder, t5gemma2_encoder, t5gemma2_layers) are missing their corresponding test files (_test.py). This is a violation of the repository's testing requirements (Style Guide, line 406) and is critical for ensuring code quality and correctness.
Unrunnable Examples: The docstring examples for T5Gemma2Tokenizer and T5Gemma2Seq2SeqLM use a preset that is not yet available, which will lead to errors for users. The examples should be updated to be runnable or clearly marked as placeholders.
PR Description Discrepancy: The description mentions that T5Gemma2 is natively multimodal with a vision encoder, but the current implementation appears to be text-only. It would be helpful to clarify this in the description.

Please address the missing tests and unrunnable examples to finalize this contribution.

keras_hub/src/models/t5gemma2/t5gemma2_attention.py

keras_hub/src/models/t5gemma2/t5gemma2_decoder.py

keras_hub/src/models/t5gemma2/t5gemma2_encoder.py

keras_hub/src/models/t5gemma2/t5gemma2_layers.py

keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm.py

keras_hub/src/models/t5gemma2/t5gemma2_seq_2_seq_lm_preprocessor.py

keras_hub/src/models/t5gemma2/t5gemma2_tokenizer.py

…ltimodel variants

divyashreepathihalli · 2026-03-06T19:11:47Z

can you please address the gemini review comments and once does, please resolve the comments you have addressed

laxmareddyp · 2026-03-06T19:17:17Z

can you please address the gemini review comments and once does, please resolve the comments you have addressed

Those are related to test files creating and are not required. Included necessary test files to cover the code. Thanks

divyashreepathihalli · 2026-03-06T19:36:03Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the T5Gemma2 model, a significant architectural update from T5Gemma1, featuring merged attention and multimodal capabilities. The implementation is comprehensive, including the backbone, task model, preprocessor, tokenizer, and a detailed checkpoint conversion script with numerical verification. The code is well-structured and adheres to the repository's patterns.

However, the pull request is missing the required Colab notebooks for numerical validation, as specified in the repository's contribution guidelines (rule #516). Please add links to Colabs demonstrating numerical equivalence for the backbone, preprocessor, and the end-to-end task model against the original implementation.

I have also found one potential correctness issue in the decoder's sliding window attention implementation, which I've detailed in a specific comment.

gemini-code-assist · 2026-03-06T19:39:24Z

keras_hub/src/models/t5gemma2/t5gemma2_decoder.py

+            sliding_mask = (
+                q_indices[:, None] - self.sliding_window
+            ) <= kv_indices[None, :]


The implementation of the causal sliding window mask appears to be off by one. The current condition (q_indices[:, None] - self.sliding_window) <= kv_indices[None, :] creates a window of size self.sliding_window + 1, as it allows a token to attend to self.sliding_window previous tokens plus itself.

To ensure the window size is exactly self.sliding_window, the condition should be adjusted.

Suggested change

sliding_mask = (

q_indices[:, None] - self.sliding_window

) <= kv_indices[None, :]

sliding_mask = (

q_indices[:, None] - (self.sliding_window - 1)

) <= kv_indices[None, :]

References

Your goal is to critically test the logic. Actively search for and point out failing edge cases, race conditions, or unhandled exceptions in the implementation.

@please address this comment

Add initial Text focused T5Gemma2 model files

f91b26e

github-actions bot added the Gemma Gemma model specific issues label Mar 2, 2026

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

laxmareddyp added 2 commits March 2, 2026 15:42

add t5gemma2 converter

8a284d4

Add checkpoint conversion script

0118760

sachinprasadhs added the new model For PRs that contribute a new model to the Keras Hub registry. label Mar 3, 2026

laxmareddyp added 7 commits March 2, 2026 23:26

Fix numerics mismatch

199aa40

Merge branch 'keras-team:master' into t5gemma2_model

b199cb8

Add vision tower and Numeric verification Fixes

b27b7ea

Merge branch 'keras-team:master' into t5gemma2_model

35e0c56

Implement passing dummy image inputs when text-only inference for mu…

febfc0c

…ltimodel variants

Add preprocessor test file

e447454

Fix backbone tests

30993c6

laxmareddyp added the kokoro:force-run Runs Tests on GPU label Mar 6, 2026

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 6, 2026

Fix preprocessor test

342bf0b

laxmareddyp added the kokoro:force-run Runs Tests on GPU label Mar 6, 2026

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 6, 2026

laxmareddyp marked this pull request as ready for review March 6, 2026 17:08

laxmareddyp requested review from divyashreepathihalli and sachinprasadhs March 6, 2026 17:08

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add T5Gemma2 model to Hub#2619

Add T5Gemma2 model to Hub#2619
laxmareddyp wants to merge 11 commits intokeras-team:masterfrom
laxmareddyp:t5gemma2_model

laxmareddyp commented Mar 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divyashreepathihalli commented Mar 6, 2026

Uh oh!

laxmareddyp commented Mar 6, 2026 •

edited

Loading

Uh oh!

divyashreepathihalli commented Mar 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 6, 2026

Uh oh!

divyashreepathihalli Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

laxmareddyp commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Colab Notebook

Checklist

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divyashreepathihalli commented Mar 6, 2026

Uh oh!

laxmareddyp commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divyashreepathihalli commented Mar 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

divyashreepathihalli Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

laxmareddyp commented Mar 2, 2026 •

edited

Loading

laxmareddyp commented Mar 6, 2026 •

edited

Loading

divyashreepathihalli Mar 6, 2026 •

edited

Loading