Arnej/add field splitter 10 by arnej27959 · Pull Request #36143 · vespa-engine/vespa

arnej27959 · 2026-03-10T09:10:16Z

Note that I chose different semantics for NEAR with asymmetric children, but otherwise I don’t know any reason why we can’t use this.

… instances After view resolution, query terms may reference multiple index fields. Downstream blueprint building prefers single-field terms. FieldSplitter bridges this gap by splitting multi-field terms into OR of per-field copies early in the query processing pipeline. Handles all query node types: simple terms, Phrase, Equiv, SameElement, NEAR/ONEAR, and multi-term nodes (WeightedSet, DotProduct, WandTerm, InTerm, WordAlternatives). Falls back to the original tree on errors. Add debug logging in BlueprintBuilder for empty blueprint cases. Includes comprehensive unit tests for all node types and edge cases.

Phrase children (individual words) don't carry field metadata - they inherit the field from their parent Phrase node. When a Phrase is nested inside an Equiv, Near, ONear, or SameElement, _force_field_id is set, and visiting phrase children with it set would cause field lookup to fail. Fix by saving/clearing/restoring _force_field_id around the child visit, consistent with the same pattern used elsewhere in the visitor.

When a SameElement has multiple fields but children have different field sets, split the SameElement into per-field copies without forcing the children, letting each child handle its own field splitting independently.

Wire up FieldSplitter::split_terms in the query build step, placed after field resolution (ResolveViewVisitor) and before ranking term extraction (NeedsRankingVisitor). This ensures multi-field query terms are normalized into per-field instances before the rest of the matching pipeline sees them, enabling field-specific optimizations.

Copilot

Pull request overview

This PR adds a FieldSplitter utility to the Vespa search query processing pipeline. It transforms query trees so that multi-field terms (terms that can match multiple fields, e.g., via view resolution) are split into separate per-field term instances connected with OR nodes. The splitter runs after ResolveViewVisitor resolves views to fields and before match data allocation, ensuring each term operates on exactly one field for simplified query execution.

Changes:

Introduces FieldSplitter with a visitor-based tree transformation that splits multi-field query terms into per-field instances connected by OR nodes, handling various node types including terms, phrases, NEAR/ONEAR, Equiv, SameElement, and multi-term nodes.
Integrates the field splitter into the query processing pipeline in Query::buildTree() after view resolution.
Adds debug logging to the blueprint builder for EmptyBlueprint creation cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`searchcore/src/vespa/searchcore/proton/matching/field_splitter.h`	New header declaring the `FieldSplitter` struct with a `split_terms` static method
`searchcore/src/vespa/searchcore/proton/matching/field_splitter.cpp`	New 824-line implementation with `FieldSplitterVisitor` handling all query node types
`searchcore/src/vespa/searchcore/proton/matching/query.cpp`	Integrates `FieldSplitter::split_terms` into `Query::buildTree` after view resolution
`searchcore/src/vespa/searchcore/proton/matching/blueprintbuilder.cpp`	Adds debug log statements for `EmptyBlueprint` creation cases
`searchcore/src/vespa/searchcore/proton/matching/CMakeLists.txt`	Adds `field_splitter.cpp` to the build
`searchcore/src/tests/proton/matching/field_splitter_test.cpp`	New comprehensive test file with 20+ test cases
`searchcore/src/tests/proton/matching/CMakeLists.txt`	Adds new test executable to the build

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T09:21:53Z

+    void splitSameElementByFieldsNoForce(ProtonSameElement &node, const std::set<uint32_t> &fields) {
+        _builder.addOr(fields.size());
+        for ([[maybe_unused]] uint32_t field_id : fields) {
+            createSameElementReplica(node);
+            visitNodes(node.getChildren());
+        }
+    }


splitSameElementByFieldsNoForce creates fields.size() identical SameElement copies under an OR, each visiting the same children without any field filtering or forcing. Since field_id is [[maybe_unused]], each iteration produces an identical subtree. This is semantically incorrect for variant (c) — the intent is presumably to create per-field SameElements, but the children aren't constrained to the corresponding field. If the intent is indeed to produce identical copies, this is redundant and wasteful. If the intent is per-field splitting, the field_id should be used (e.g., passed to splitAndVisitChildrenForField). Consider either: (1) forcing children like variant (b), or (2) collapsing this to a single SameElement without OR if truly no per-field distinction is needed.

Copilot · 2026-03-10T09:21:53Z

+    ProtonSameElement& createSameElementReplica(ProtonSameElement &node) {
+        auto &replica = _builder.addSameElement(node.getChildren().size(), node.getView(),
+                                               node.getId(), node.getWeight());
+        replica.set_expensive(node.is_expensive());
+        return replica;
+    }


createSameElementReplica is missing two properties from the original SameElement node:

element_filter: The addSameElement builder method accepts an optional element_filter parameter (see querybuilder.h:305), but the original node's get_element_filter() is not passed. If the original SameElement has a non-empty element_filter, the replicas will lose this property.

expose_match_data_for_same_element: The SameElementModifier runs before the field splitter (see query.cpp:196-202) and may set this flag to false on the original node. The replica always gets the default value true, which could cause incorrect match data exposure.

Both should be copied from the original node to the replica.

Copilot · 2026-03-10T09:21:53Z

+    void visit(ProtonAndNot &node) override {
+        _builder.addAndNot(node.getChildren().size());
+        visitNodes(node.getChildren());
+    }


visit(ProtonAndNot) does not copy the elementwise flag from the original node to the replica. The SameElementModifier (which runs before the field splitter, see query.cpp:196-197) sets elementwise = true on ProtonAndNot nodes inside SameElement (see sameelementmodifier.cpp:76). The replica ProtonAndNot will always have elementwise = false (the default), leading to incorrect behavior for AndNot nodes inside SameElement after splitting.

Copilot · 2026-03-10T09:21:54Z

+        }
+    }
+
+    // Helper to build field-to-children mapping for Equiv nodes


The comment says "Helper to build field-to-children mapping for Equiv nodes" but buildFieldToChildrenMap is also used for Near and ONear nodes. The comment should be more general, e.g., "Helper to build field-to-children mapping for nodes that group children by field".

- Copy element_filter and expose_match_data_for_same_element in createSameElementReplica - Copy elementwise flag in visit(ProtonAndNot) - Generalize comment on buildFieldToChildrenMap

…ring For SameElement variant (c) - multiple fields with mixed-field children - create one SameElement replica per parent field and filter each child's fields to only those whose names match the parent field name (exact or dot-separated sub-field prefix). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ld filtering" This reverts commit 0b69f90.

Two bugs fixed in FieldSplitter's SameElement handling: 1. splitSameElementByFields was creating per-field SameElement replicas without copying the field entry (ProtonTermData) from the original node. Added the missing copyProtonTermDataForField call so each replica gets its correct field. The method now derives the field list from node.numFields() directly instead of requiring a separate fields set parameter. 2. The variant-(c) path (splitSameElementByFieldsNoForce) was structurally wrong: it split the SameElement into per-field replicas but then visited children without forcing a field, so children would be split across all their own fields independently -- producing query trees where the per-field SameElements contained children bound to the wrong fields. Since this case (mixed child field sets inside a multi-field SameElement) should not arise in valid queries, it is now treated as an error with a safe pass-through fallback.

arnej27959 added 4 commits March 9, 2026 18:03

Handle SameElement in 3 different ways

a252336

When a SameElement has multiple fields but children have different field sets, split the SameElement into per-field copies without forcing the children, letting each child handle its own field splitting independently.

arnej27959 requested review from Copilot, havardpe and toregge March 10, 2026 09:10

arnej27959 assigned havardpe Mar 10, 2026

Copilot started reviewing on behalf of arnej27959 March 10, 2026 09:10 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

arnej27959 and others added 5 commits March 11, 2026 10:55

Address PR review feedback in FieldSplitter

9bcd3e4

- Copy element_filter and expose_match_data_for_same_element in createSameElementReplica - Copy elementwise flag in visit(ProtonAndNot) - Generalize comment on buildFieldToChildrenMap

reformat

67296c1

Revert "Replace splitSameElementByFieldsNoForce with prefix-based chi…

43c1ca2

…ld filtering" This reverts commit 0b69f90.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arnej/add field splitter 10#36143

Arnej/add field splitter 10#36143
arnej27959 wants to merge 9 commits into
masterfrom
arnej/add-field-splitter-10

arnej27959 commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arnej27959 commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants