feat: [2.6] struct element hybrid search design and impl#50369
feat: [2.6] struct element hybrid search design and impl#50369zhengbuqian wants to merge 8 commits into
Conversation
issue: milvus-io#42148 design doc: docs/design-docs/design_docs/20260602-struct_hybrid_search.md (cherry picked from commit 2f3f19e) Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
| Topks: append([]int64(nil), data.GetTopks()...), | ||
| FieldsData: data.GetFieldsData(), | ||
| Scores: append([]float32(nil), data.GetScores()...), | ||
| Ids: &schemapb.IDs{IdField: &schemapb.IDs_StrId{StrId: &schemapb.StringArray{Data: keys}}}, |
There was a problem hiding this comment.
prepareElementLevelHybridResult always emits IDs_StrId, but the reranker binds its generic T to the collection PK type and processOneSearchData does an unchecked col.ids.([]T) (rrf_function.go:77). On an int64-PK collection T is int64, so asserting the synthetic string-ID slice to []int64 is an invalid type assertion with no PK-type guard, panicking on every element-level hybrid search over an int64-PK collection (the common case). Existing tests miss this because they bypass the real reranker; fix by emitting IDs that match the PK type, or converting/guarding before the reranker.
| } | ||
|
|
||
| annsField := typeutil.GetField(t.schema.CollectionSchema, t.FieldId) | ||
| if annsField != nil && annsField.GetDataType() == schemapb.DataType_ArrayOfVector { |
There was a problem hiding this comment.
parseSearchInfo still contains a stale guard (search_util.go:303-317) that rejects any ArrayOfVector anns-field using radius/group_by/iterator with the old "embedding list" message before the placeholder type is known. Because it runs inside tryGeneratePlan->parseSearchInfo, it preempts the new placeholder-aware validation in both initSearchRequest (single search, line 806) and the per-sub-request loop in initAdvancedSearchRequest (hybrid, line 452), and since GetFieldByName resolves struct sub-fields it also fires for struct element/emb-list searches. The PR's own new tests fail as a result — 'element-level range search should succeed' and 'element-level iterator v2 should succeed' get rejected, and the hybrid range/iterator cases match the old message instead of the new '...in hybrid search' one — consistent with the red ci-v2/ut-go check. master deleted this guard and moved the checks into task_search.go; port that by removing or making the search_util.go guard placeholder-aware.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
Codecov Report✅ All modified and coverable lines are covered by tests. Please upload reports for the commit af59a67 to get more accurate results. ❌ Your project check has failed because the head coverage (42.57%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## 2.6 #50369 +/- ##
===========================================
- Coverage 76.99% 42.57% -34.42%
===========================================
Files 1700 12 -1688
Lines 262533 1893 -260640
===========================================
- Hits 202142 806 -201336
+ Misses 53550 1035 -52515
+ Partials 6841 52 -6789
🚀 New features to boost your workflow:
|
| return merr.WrapErrParameterInvalid("", "", | ||
| "range search is not supported for vector array ("+searchKind+") fields in hybrid search, fieldName:"+annsField.GetName()) | ||
| } | ||
| if t.rankParams.GetGroupByFieldId() > 0 || len(t.rankParams.GetGroupByFieldIds()) > 0 { |
There was a problem hiding this comment.
Line 490 calls t.rankParams.GetGroupByFieldIds(), but the rankParams type only defines the singular groupByFieldId field and GetGroupByFieldId() accessor — no plural accessor exists anywhere in the repo. This is an undefined-method compile error that prevents internal/proxy (and the binary) from building. Drop the || len(t.rankParams.GetGroupByFieldIds()) > 0 clause and rely on the singular GetGroupByFieldId() > 0.
| "legacy search iterator is not supported for element-level search on embedding list fields; use search iterator v2") | ||
| } | ||
|
|
||
| groupByFieldIDs := queryInfo.GetGroupByFieldIds() |
There was a problem hiding this comment.
Line 861 calls queryInfo.GetGroupByFieldIds(), but planpb.QueryInfo only has the singular GroupByFieldId field (proto field 6) with accessor GetGroupByFieldId() — there is no plural field or method in this 2.6 proto. This is a second undefined-method compile error in internal/proxy. Use the singular GetGroupByFieldId(); the following two lines already fall back to it.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
| if t.rankParams.GetGroupByFieldId() > 0 { | ||
| return merr.WrapErrParameterInvalid("", "", | ||
| "group by search is not supported for vector array (embedding list) fields in hybrid search, fieldName:"+annsField.GetName()) | ||
| if err := validateElementCollapseMetricType(collapseConfig, queryInfo.GetMetricType()); err != nil { |
There was a problem hiding this comment.
In initAdvancedSearchRequest (task_search.go:467), validateElementCollapseMetricType is passed queryInfo.GetMetricType(), which is empty whenever the caller omits metric_type and relies on the index-resolved metric (parseSearchInfo leaves it "" at search_util.go:232). Because PositivelyRelated("") is false, a valid element_scope collapse with sum/topk_sum on an IP/COSINE field is rejected with 'only supported for positively related metrics'. The runtime collapse stage already re-validates against the metric resolved from results (search_pipeline.go: 'the metrictype in the request may be empty, it can only be obtained from the result'), so this proxy-side check is premature — validate against the resolved metric or drop it. Note the defect is in initAdvancedSearchRequest, not initSearchRequest, and metric_type is not schema-defaulted.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
| } | ||
| for _, field := range cloned.GetFields() { | ||
| if field.GetIsPrimaryKey() { | ||
| field.DataType = schemapb.DataType_VarChar |
There was a problem hiding this comment.
elementLevelHybridRerankSchema clones the schema and flips the primary key field's DataType to VarChar (task_search.go:651). newRerankBase derives both pkType and the rerank input-field types from that cloned schema, so when a decay reranker's input field is the primary key itself, a numeric PK is read as VarChar and newDecayFunction rejects it via its else branch at decay_function.go:96 ("only support numeric field"). The same decay-over-numeric-PK configuration is accepted on the normal row-level hybrid path, so element-level hybrid behaves inconsistently. This only triggers when the decay input field is the PK; if that config should be supported, classify the decay input against the PK's real type rather than the flipped VarChar.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zhengbuqian The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
| }) | ||
|
|
||
| t.Run("regular vector advanced controls should succeed", func(t *testing.T) { | ||
| task := makeTask("regular_vec", commonpb.PlaceholderType_FloatVector, rangeParams, true, false, "scalar_field") |
There was a problem hiding this comment.
This case calls makeTask("regular_vec", …, rangeParams, /*withIterator=*/true, false, /*groupByField=*/"scalar_field") then asserts initSearchRequest succeeds, but it enables iterator and group-by at the same time. parseSearchInfo (reached via initSearchRequest → tryGeneratePlan) hits the pre-existing guard if isIterator && groupByFieldId > 0 { return ...WrapErrParameterInvalid(..., "Not allowed to do groupBy when doing iteration") } before any ArrayOfVector validation runs, so the call returns an error and assert.NoError fails deterministically on every CI run with the proxy test environment available. Split this into separate range / iterator / group-by regular-vector cases like TestSearchTask_ArrayOfVectorHybridSearch does, or drop one of the conflicting withIterator/groupByField controls.
| } | ||
| } else if t.isIterator && queryInfo.GetSearchIteratorV2Info() == nil { | ||
| return merr.WrapErrParameterInvalid("", "", | ||
| "legacy search iterator is not supported for element-level search on embedding list fields; use search iterator v2") |
There was a problem hiding this comment.
The error strings at lines 843 and 859 both read "...element-level search on embedding list fields", yet they live in the non-embedding-list (element-level) branch, so the wording contradicts the code path that emits them. A user who trips the legacy-iterator or group-by-non-pk error during a struct-array element search is pointed at the wrong concept (embedding list fields), making the diagnostic self-contradictory and misleading. Drop "on embedding list fields" from both strings.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
| func validateElementCollapseMetricType(config elementCollapseConfig, metricType string) error { | ||
| if config.Strategy == "" || | ||
| !isElementCollapseSumFamily(config.Strategy) || | ||
| strings.TrimSpace(metricType) == "" || |
There was a problem hiding this comment.
The sum/topk_sum metric check is skipped here and only enforced downstream at search_pipeline.go:390, so a sum-family collapse request that omits metric_type and resolves to a negative metric is not rejected up front. Because validation is deferred, the request fails only after the distributed search has already executed, wasting query work before the user gets the error. Confirm the late failure is acceptable, or validate the resolved metric before dispatching the search.
| } | ||
|
|
||
| annsField := typeutil.GetField(t.schema.CollectionSchema, t.FieldId) | ||
| if annsField != nil && annsField.GetDataType() == schemapb.DataType_ArrayOfVector { |
There was a problem hiding this comment.
This relocated validation relaxes element-level single (non-hybrid) search to allow range search, iterator-v2, and group-by-PK that were previously rejected for all ArrayOfVector fields. This is a behavior change to the non-hybrid path shipped inside a hybrid-search PR, and the proxy-level tests asserting acceptance don't prove QueryNode actually handles these operations correctly on ArrayOfVector. Confirm the relaxation is intended and is covered by integration tests, not just proxy unit tests.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
| if result == nil || result.GetResults() == nil || result.GetResults().GetElementIndices() == nil { | ||
| return result, nil | ||
| } | ||
| if isElementCollapseSumFamily(config.Strategy) && !largerScoreIsBetter { |
There was a problem hiding this comment.
The sum-family guard fires before the totalRows == 0 fast path at line 402. When an element-level sub-search returns zero hits, getMetricType resolves an empty metric, so metric.PositivelyRelated("") is false and largerScoreIsBetter is false. For sum/topk_sum this rejects the result with an "only supported for positively related metrics" error, even though the operator-level guard at lines 328-336 deliberately lets the empty-metric zero-rows state through and the identical state succeeds for max (per TestElementBestCollapseOpAllowsEmptyElementLevelResultWithoutMetric). The error is also misleading because the user's metric already passed upstream validation. Move the guard below the totalRows == 0 fast path so an empty result collapses to empty regardless of strategy.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
issue: #42148
pr: #50243
design doc: docs/design-docs/design_docs/20260602-struct_hybrid_search.md