feat: add optional embed_model to SemanticDoubleMergingSplitterNodeParser by MkDev11 · Pull Request #20748 · run-llama/llama_index

MkDev11 · 2026-02-19T14:24:24Z

Description

Adds optional embedding-model support to SemanticDoubleMergingSplitterNodeParser so users can chunk text in any language (e.g. via Hugging Face / sentence-transformers) without depending on Spacy. When embed_model is set, similarity is computed with BaseEmbedding.get_text_embedding_batch and similarity() instead of Spacy; when unset, existing Spacy + LanguageConfig behavior is unchanged. No new dependencies in llama-index-core; users supply an embedding (e.g. llama-index-embeddings-huggingface) if they want HF.

Closes #15041

New Package?

No

Version Bump?

No

Type of Change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

I added new unit tests to cover this change

Tests added: test_embed_model_path_returns_nodes, test_embed_model_similarity_in_range, test_embed_model_single_sentence_document in tests/node_parser/test_semantic_double_merging_splitter.py (use MockEmbedding; no Spacy required).

Run: cd llama-index-core && python3 -m pytest tests/node_parser/test_semantic_double_merging_splitter.py -v

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

…plitterNodeParser (Closes run-llama#15041)

MkDev11 · 2026-02-19T14:32:24Z

@AstraBert can you please review the PR and let me know your feedback?

feat(node_parser): add optional embed_model to SemanticDoubleMergingS…

75167bc

…plitterNodeParser (Closes run-llama#15041)

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 19, 2026

Merge branch 'main' into feature/15041-embedding-double-merging-splitter

044bb1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add optional embed_model to SemanticDoubleMergingSplitterNodeParser#20748

feat: add optional embed_model to SemanticDoubleMergingSplitterNodeParser#20748
MkDev11 wants to merge 2 commits intorun-llama:mainfrom
MkDev11:feature/15041-embedding-double-merging-splitter

MkDev11 commented Feb 19, 2026

Uh oh!

MkDev11 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

MkDev11 commented Feb 19, 2026

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

MkDev11 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments