Skip to content

feat: [2.6] struct element hybrid search design and impl#50369

Open
zhengbuqian wants to merge 8 commits into
milvus-io:2.6from
zhengbuqian:cherry-pick-50243-2.6
Open

feat: [2.6] struct element hybrid search design and impl#50369
zhengbuqian wants to merge 8 commits into
milvus-io:2.6from
zhengbuqian:cherry-pick-50243-2.6

Conversation

@zhengbuqian

Copy link
Copy Markdown
Collaborator

issue: #42148
pr: #50243

design doc: docs/design-docs/design_docs/20260602-struct_hybrid_search.md

issue: milvus-io#42148

design doc: docs/design-docs/design_docs/20260602-struct_hybrid_search.md

(cherry picked from commit 2f3f19e)
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. approved labels Jun 8, 2026
@mergify mergify Bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Jun 8, 2026
@sre-ci-robot sre-ci-robot added the do-not-merge/need-milestone generate by v2-label-manager label Jun 8, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-code-check-macos // for Code Checker MacOS (GitHub Actions)
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

Topks: append([]int64(nil), data.GetTopks()...),
FieldsData: data.GetFieldsData(),
Scores: append([]float32(nil), data.GetScores()...),
Ids: &schemapb.IDs{IdField: &schemapb.IDs_StrId{StrId: &schemapb.StringArray{Data: keys}}},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepareElementLevelHybridResult always emits IDs_StrId, but the reranker binds its generic T to the collection PK type and processOneSearchData does an unchecked col.ids.([]T) (rrf_function.go:77). On an int64-PK collection T is int64, so asserting the synthetic string-ID slice to []int64 is an invalid type assertion with no PK-type guard, panicking on every element-level hybrid search over an int64-PK collection (the common case). Existing tests miss this because they bypass the real reranker; fix by emitting IDs that match the PK type, or converting/guarding before the reranker.

}

annsField := typeutil.GetField(t.schema.CollectionSchema, t.FieldId)
if annsField != nil && annsField.GetDataType() == schemapb.DataType_ArrayOfVector {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseSearchInfo still contains a stale guard (search_util.go:303-317) that rejects any ArrayOfVector anns-field using radius/group_by/iterator with the old "embedding list" message before the placeholder type is known. Because it runs inside tryGeneratePlan->parseSearchInfo, it preempts the new placeholder-aware validation in both initSearchRequest (single search, line 806) and the per-sub-request loop in initAdvancedSearchRequest (hybrid, line 452), and since GetFieldByName resolves struct sub-fields it also fires for struct element/emb-list searches. The PR's own new tests fail as a result — 'element-level range search should succeed' and 'element-level iterator v2 should succeed' get rejected, and the hybrid range/iterator cases match the old message instead of the new '...in hybrid search' one — consistent with the red ci-v2/ut-go check. master deleted this guard and moved the checks into task_search.go; port that by removing or making the search_util.go guard placeholder-aware.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.57%. Comparing base (cb08db0) to head (3efcd34).
⚠️ Report is 941 commits behind head on 2.6.

⚠️ Current head 3efcd34 differs from pull request most recent head af59a67

Please upload reports for the commit af59a67 to get more accurate results.

❌ Your project check has failed because the head coverage (42.57%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (cb08db0) and HEAD (3efcd34). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (cb08db0) HEAD (3efcd34)
2 1
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##              2.6   #50369       +/-   ##
===========================================
- Coverage   76.99%   42.57%   -34.42%     
===========================================
  Files        1700       12     -1688     
  Lines      262533     1893   -260640     
===========================================
- Hits       202142      806   -201336     
+ Misses      53550     1035    -52515     
+ Partials     6841       52     -6789     
Components Coverage Δ
Client ∅ <ø> (∅)
Core ∅ <ø> (∅)
Go ∅ <ø> (∅)
see 1697 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread internal/proxy/task_search.go Outdated
return merr.WrapErrParameterInvalid("", "",
"range search is not supported for vector array ("+searchKind+") fields in hybrid search, fieldName:"+annsField.GetName())
}
if t.rankParams.GetGroupByFieldId() > 0 || len(t.rankParams.GetGroupByFieldIds()) > 0 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 490 calls t.rankParams.GetGroupByFieldIds(), but the rankParams type only defines the singular groupByFieldId field and GetGroupByFieldId() accessor — no plural accessor exists anywhere in the repo. This is an undefined-method compile error that prevents internal/proxy (and the binary) from building. Drop the || len(t.rankParams.GetGroupByFieldIds()) > 0 clause and rely on the singular GetGroupByFieldId() > 0.

Comment thread internal/proxy/task_search.go Outdated
"legacy search iterator is not supported for element-level search on embedding list fields; use search iterator v2")
}

groupByFieldIDs := queryInfo.GetGroupByFieldIds()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 861 calls queryInfo.GetGroupByFieldIds(), but planpb.QueryInfo only has the singular GroupByFieldId field (proto field 6) with accessor GetGroupByFieldId() — there is no plural field or method in this 2.6 proto. This is a second undefined-method compile error in internal/proxy. Use the singular GetGroupByFieldId(); the following two lines already fall back to it.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

Comment thread internal/proxy/task_search.go Outdated
if t.rankParams.GetGroupByFieldId() > 0 {
return merr.WrapErrParameterInvalid("", "",
"group by search is not supported for vector array (embedding list) fields in hybrid search, fieldName:"+annsField.GetName())
if err := validateElementCollapseMetricType(collapseConfig, queryInfo.GetMetricType()); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In initAdvancedSearchRequest (task_search.go:467), validateElementCollapseMetricType is passed queryInfo.GetMetricType(), which is empty whenever the caller omits metric_type and relies on the index-resolved metric (parseSearchInfo leaves it "" at search_util.go:232). Because PositivelyRelated("") is false, a valid element_scope collapse with sum/topk_sum on an IP/COSINE field is rejected with 'only supported for positively related metrics'. The runtime collapse stage already re-validates against the metric resolved from results (search_pipeline.go: 'the metrictype in the request may be empty, it can only be obtained from the result'), so this proxy-side check is premature — validate against the resolved metric or drop it. Note the defect is in initAdvancedSearchRequest, not initSearchRequest, and metric_type is not schema-defaulted.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Jun 8, 2026
Comment thread internal/proxy/task_search.go Outdated
}
for _, field := range cloned.GetFields() {
if field.GetIsPrimaryKey() {
field.DataType = schemapb.DataType_VarChar

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elementLevelHybridRerankSchema clones the schema and flips the primary key field's DataType to VarChar (task_search.go:651). newRerankBase derives both pkType and the rerank input-field types from that cloned schema, so when a decay reranker's input field is the primary key itself, a numeric PK is read as VarChar and newDecayFunction rejects it via its else branch at decay_function.go:96 ("only support numeric field"). The same decay-over-numeric-PK configuration is accepted on the normal row-level hybrid path, so element-level hybrid behaves inconsistently. This only triggers when the decay input field is the PK; if that config should be supported, classify the decay input against the PK's real type rather than the flipped VarChar.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

Comment thread internal/proxy/task_search_test.go Outdated
})

t.Run("regular vector advanced controls should succeed", func(t *testing.T) {
task := makeTask("regular_vec", commonpb.PlaceholderType_FloatVector, rangeParams, true, false, "scalar_field")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case calls makeTask("regular_vec", …, rangeParams, /*withIterator=*/true, false, /*groupByField=*/"scalar_field") then asserts initSearchRequest succeeds, but it enables iterator and group-by at the same time. parseSearchInfo (reached via initSearchRequesttryGeneratePlan) hits the pre-existing guard if isIterator && groupByFieldId > 0 { return ...WrapErrParameterInvalid(..., "Not allowed to do groupBy when doing iteration") } before any ArrayOfVector validation runs, so the call returns an error and assert.NoError fails deterministically on every CI run with the proxy test environment available. Split this into separate range / iterator / group-by regular-vector cases like TestSearchTask_ArrayOfVectorHybridSearch does, or drop one of the conflicting withIterator/groupByField controls.

Comment thread internal/proxy/task_search.go Outdated
}
} else if t.isIterator && queryInfo.GetSearchIteratorV2Info() == nil {
return merr.WrapErrParameterInvalid("", "",
"legacy search iterator is not supported for element-level search on embedding list fields; use search iterator v2")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error strings at lines 843 and 859 both read "...element-level search on embedding list fields", yet they live in the non-embedding-list (element-level) branch, so the wording contradicts the code path that emits them. A user who trips the legacy-iterator or group-by-non-pk error during a struct-array element search is pointed at the wrong concept (embedding list fields), making the diagnostic self-contradictory and misleading. Drop "on embedding list fields" from both strings.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Jun 9, 2026
func validateElementCollapseMetricType(config elementCollapseConfig, metricType string) error {
if config.Strategy == "" ||
!isElementCollapseSumFamily(config.Strategy) ||
strings.TrimSpace(metricType) == "" ||

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum/topk_sum metric check is skipped here and only enforced downstream at search_pipeline.go:390, so a sum-family collapse request that omits metric_type and resolves to a negative metric is not rejected up front. Because validation is deferred, the request fails only after the distributed search has already executed, wasting query work before the user gets the error. Confirm the late failure is acceptable, or validate the resolved metric before dispatching the search.

}

annsField := typeutil.GetField(t.schema.CollectionSchema, t.FieldId)
if annsField != nil && annsField.GetDataType() == schemapb.DataType_ArrayOfVector {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relocated validation relaxes element-level single (non-hybrid) search to allow range search, iterator-v2, and group-by-PK that were previously rejected for all ArrayOfVector fields. This is a behavior change to the non-hybrid path shipped inside a hybrid-search PR, and the proxy-level tests asserting acceptance don't prove QueryNode actually handles these operations correctly on ArrayOfVector. Confirm the relaxation is intended and is covered by integration tests, not just proxy unit tests.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Jun 9, 2026
Comment thread internal/proxy/search_pipeline.go Outdated
if result == nil || result.GetResults() == nil || result.GetResults().GetElementIndices() == nil {
return result, nil
}
if isElementCollapseSumFamily(config.Strategy) && !largerScoreIsBetter {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum-family guard fires before the totalRows == 0 fast path at line 402. When an element-level sub-search returns zero hits, getMetricType resolves an empty metric, so metric.PositivelyRelated("") is false and largerScoreIsBetter is false. For sum/topk_sum this rejects the result with an "only supported for positively related metrics" error, even though the operator-level guard at lines 328-336 deliberately lets the empty-metric zero-rows state through and the identical state succeeds for max (per TestElementBestCollapseOpAllowsEmptyElementLevelResultWithoutMetric). The error is also misleading because the user's metric already passed upstream validation. Move the guard below the totalRows == 0 fast path so an empty result collapses to empty regardless of strategy.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[INFO] PR Label Summary by Default
[SUCCESS] PR #50243 merged to master

[WARNING] Milestone not set

You can set milestone by commenting:
/set-milestone
Example:
/set-milestone 2.5.0

Use /refresh-label to update related check and label manually

@zhengbuqian zhengbuqian added this to the 2.6.19 milestone Jun 9, 2026
@zhengbuqian zhengbuqian removed the do-not-merge/need-milestone generate by v2-label-manager label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-passed DCO check passed. kind/feature Issues related to feature request from users low-code-coverage add test-label from zhikun, diff coverage > 80% size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants