feat: add extract_submodel parameter to build_encoder_backbone#1838
Draft
oliverholworthy wants to merge 2 commits intomainfrom
Draft
feat: add extract_submodel parameter to build_encoder_backbone#1838oliverholworthy wants to merge 2 commits intomainfrom
oliverholworthy wants to merge 2 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
ff5428d to
e965d38
Compare
Add a dotted attribute path parameter for extracting a submodel from a
loaded checkpoint. This enables generic VLM text backbone extraction
without model-specific code:
build_encoder_backbone("mistralai/Ministral-3-3B-Base-2512",
task="embedding",
extract_submodel="language_model")
Different VLM families use different attribute names (.language_model,
.text_model, .model.language_model) — the dotted-path approach handles
this via config rather than per-architecture code.
Validates that the extracted submodel has a .config attribute (i.e. is a
PreTrainedModel), raising ValueError if not. Includes round-trip tests
with both a tiny Mistral3 VLM config and the real Ministral-3-3B weights.
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
e965d38 to
bba8298
Compare
Contributor
|
/ok to test b44785f |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Add an
extract_submodelparameter tobuild_encoder_backbonefor generic VLM text backbone extraction via dotted attribute path, replacing the need for model-specific extraction code like_from_vlm_checkpoint.Changelog
extract_submodel: str | Noneparameter tobuild_encoder_backbone()in the generic (non-SUPPORTED_BACKBONES) code path"language_model"extracts the text backbone from a VLM).language_model,.text_model,.model.language_model) — the dotted-path approach handles this via config rather than per-architecture codetests/unit_tests/_transformers/test_retrieval.py— new test module forbuild_encoder_backboneBefore your PR is "Ready for review"
Pre checks:
Additional Information
The parameter flows through the existing YAML config system via
**kwargspassthrough:extract_submodelis a named parameter onbuild_encoder_backbone, so it is consumed there and not forwarded to HF'sfrom_pretrained.Companion to #1837 (is_causal refactor). These two PRs are independent and can be reviewed/merged in any order.