Skip to content

feat: add extract_submodel parameter to build_encoder_backbone#1838

Draft
oliverholworthy wants to merge 2 commits intomainfrom
oholworthy/extract_submodel
Draft

feat: add extract_submodel parameter to build_encoder_backbone#1838
oliverholworthy wants to merge 2 commits intomainfrom
oholworthy/extract_submodel

Conversation

@oliverholworthy
Copy link
Copy Markdown
Contributor

@oliverholworthy oliverholworthy commented Apr 14, 2026

What does this PR do ?

Add an extract_submodel parameter to build_encoder_backbone for generic VLM text backbone extraction via dotted attribute path, replacing the need for model-specific extraction code like _from_vlm_checkpoint.

Changelog

  • Add extract_submodel: str | None parameter to build_encoder_backbone() in the generic (non-SUPPORTED_BACKBONES) code path
  • When set, walks the dotted attribute path to extract a submodel after loading (e.g. "language_model" extracts the text backbone from a VLM)
  • Different VLM families use different attribute names (.language_model, .text_model, .model.language_model) — the dotted-path approach handles this via config rather than per-architecture code
  • Add tests/unit_tests/_transformers/test_retrieval.py — new test module for build_encoder_backbone

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

Additional Information

The parameter flows through the existing YAML config system via **kwargs passthrough:

model:
  _target_: nemo_automodel.NeMoAutoModelBiEncoder.from_pretrained
  pretrained_model_name_or_path: mistralai/Ministral-3-3B-Base-2512
  extract_submodel: language_model
  pooling: avg
  l2_normalize: true

extract_submodel is a named parameter on build_encoder_backbone, so it is consumed there and not forwarded to HF's from_pretrained.

Companion to #1837 (is_causal refactor). These two PRs are independent and can be reviewed/merged in any order.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@oliverholworthy oliverholworthy force-pushed the oholworthy/extract_submodel branch 2 times, most recently from ff5428d to e965d38 Compare April 15, 2026 09:59
Add a dotted attribute path parameter for extracting a submodel from a
loaded checkpoint. This enables generic VLM text backbone extraction
without model-specific code:

  build_encoder_backbone("mistralai/Ministral-3-3B-Base-2512",
                         task="embedding",
                         extract_submodel="language_model")

Different VLM families use different attribute names (.language_model,
.text_model, .model.language_model) — the dotted-path approach handles
this via config rather than per-architecture code.

Validates that the extracted submodel has a .config attribute (i.e. is a
PreTrainedModel), raising ValueError if not. Includes round-trip tests
with both a tiny Mistral3 VLM config and the real Ministral-3-3B weights.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
@oliverholworthy oliverholworthy force-pushed the oholworthy/extract_submodel branch from e965d38 to bba8298 Compare April 15, 2026 14:53
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 19, 2026

/ok to test b44785f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants