Skip to content

[Testing] Missing unit tests for Embedder.encode_vision multimodal path #544

@mohamed-717-os

Description

@mohamed-717-os

The Embedder module in gemma/gm/nn/_modules.py currently implements an encode_vision method which acts as the critical "bridge" for multimodal inference. This method projects visual features (e.g., from SigLiP) into the Transformer's unified embedding space using RMSNorm and an Einsum projection.

Currently, there are no dedicated unit tests for this path, as noted by the TODO at line 74 of gemma/gm/nn/_modules_test.py.

Goal:

  • Implement a robust test suite for encode_vision.
  • Verify that initializing the Embedder with vision_proj_dim correctly creates the mm_input_projection and mm_soft_embedding_norm parameters.
  • Ensure that visual tokens are correctly projected to the model's embed_dim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions