Skip to content

feat: Add Matryoshka support when loading a model#4170

Open
Samoed wants to merge 7 commits intomainfrom
mrl
Open

feat: Add Matryoshka support when loading a model#4170
Samoed wants to merge 7 commits intomainfrom
mrl

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Feb 25, 2026

Close #2832

Add Matryoshka support. Extended ModelMeta.embedding_dim to support list for different embedding dimensions. Example:

import mteb

model = mteb.get_model("jinaai/jina-embeddings-v5-text-nano", embed_dim=32)

@Samoed Samoed added the enhancement New feature or request label Feb 25, 2026
Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. One small thing, and how would this be tested?

"""
meta = get_model_meta(model_name, revision).model_copy(deep=True)
model = meta.load_model(device=device, **kwargs)
model = meta.load_model(device=device, embed_dim=embed_dim, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about letting embed_dim be part of kwargs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it to args of get_model for more visibility of this feature

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions bot added the stale label Mar 14, 2026
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great addition! Can the following tests:

  1. Ensure that it works with experiments
  2. ensure that the metadata saves when running a model corresponds to the embedding dimension used. Currently I think it will save a list

array_framework: Literal["numpy", "torch"] = "numpy",
dtype: torch.dtype | np.floating = np.float32,
embed_dim: int = _EMBEDDING_DIM,
embed_dim: int | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match with SentenceTransformerWrapper

@KennethEnevoldsen KennethEnevoldsen changed the title add Matryoshka support feat: Add Matryoshka support when loading a model Mar 14, 2026
@github-actions github-actions bot removed the stale label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle MRL

3 participants