Environment
- Milvus version: Zilliz Cloud Dedicated, REST v2 endpoint
- Deployment mode: Zilliz Cloud Dedicated
- MQ type: N/A
- SDK version: PyMilvus 3.0.0 for the control case
- OS: macOS client
- Collection type: regular partition collection, no partition key
- Vector fields: two dense
FloatVector fields, vector_a and vector_b
- Partitions used in the test:
part_a, part_b
Current Behavior
RESTful hybrid_search ignores the documented top-level partitionNames parameter.
In the same collection:
- RESTful single-vector
search with top-level partitionNames: ["part_a"] returns only rows from part_a.
- PyMilvus
hybrid_search(..., partition_names=["part_a"]) returns only rows from part_a.
- RESTful
hybrid_search with top-level partitionNames: ["part_a"] returns rows from both part_a and part_b.
Observed RESTful hybrid_search response:
{
"code": 0,
"data": [
{"bucket": "part_a", "id": 1},
{"bucket": "part_a", "id": 2},
{"bucket": "part_b", "id": 4},
{"bucket": "part_b", "id": 3}
]
}
No error is returned. The search scope is silently widened to the whole collection.
Expected Behavior
RESTful hybrid_search should honor top-level partitionNames, consistent with RESTful search and PyMilvus hybrid_search.
For partitionNames: ["part_a"], only rows from part_a should be returned:
{
"code": 0,
"data": [
{"bucket": "part_a", "id": 1},
{"bucket": "part_a", "id": 2}
]
}
Steps To Reproduce
Set the endpoint and token:
export ZILLIZ_URI="https://<cluster-endpoint>"
export ZILLIZ_TOKEN="<token>"
Create a collection with two vector fields and no partition key:
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/collections/create" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"collectionName": "rest_manual_partition_repro",
"schema": {
"autoId": false,
"enabledDynamicField": false,
"fields": [
{"fieldName": "id", "dataType": "Int64", "isPrimary": true},
{"fieldName": "vector_a", "dataType": "FloatVector", "elementTypeParams": {"dim": "5"}},
{"fieldName": "vector_b", "dataType": "FloatVector", "elementTypeParams": {"dim": "5"}},
{"fieldName": "bucket", "dataType": "VarChar", "elementTypeParams": {"max_length": 64}}
]
},
"indexParams": [
{
"fieldName": "vector_a",
"indexName": "vector_a",
"metricType": "COSINE",
"params": {"index_type": "AUTOINDEX"}
},
{
"fieldName": "vector_b",
"indexName": "vector_b",
"metricType": "COSINE",
"params": {"index_type": "AUTOINDEX"}
}
]
}'
Create two partitions:
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/partitions/create" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{"collectionName": "rest_manual_partition_repro", "partitionName": "part_a"}'
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/partitions/create" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{"collectionName": "rest_manual_partition_repro", "partitionName": "part_b"}'
Insert rows into each partition:
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/insert" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"collectionName": "rest_manual_partition_repro",
"partitionName": "part_a",
"data": [
{"id": 1, "vector_a": [0.10, 0.20, 0.30, 0.40, 0.50], "vector_b": [0.50, 0.40, 0.30, 0.20, 0.10], "bucket": "part_a"},
{"id": 2, "vector_a": [0.11, 0.21, 0.31, 0.41, 0.51], "vector_b": [0.51, 0.41, 0.31, 0.21, 0.11], "bucket": "part_a"}
]
}'
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/insert" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"collectionName": "rest_manual_partition_repro",
"partitionName": "part_b",
"data": [
{"id": 3, "vector_a": [0.90, 0.80, 0.70, 0.60, 0.50], "vector_b": [0.50, 0.60, 0.70, 0.80, 0.90], "bucket": "part_b"},
{"id": 4, "vector_a": [0.91, 0.81, 0.71, 0.61, 0.51], "vector_b": [0.51, 0.61, 0.71, 0.81, 0.91], "bucket": "part_b"}
]
}'
Control case: RESTful single-vector search honors top-level partitionNames.
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/search" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"collectionName": "rest_manual_partition_repro",
"partitionNames": ["part_a"],
"data": [[0.10, 0.20, 0.30, 0.40, 0.50]],
"annsField": "vector_a",
"limit": 4,
"outputFields": ["id", "bucket"]
}'
Actual result:
{
"code": 0,
"data": [
{"bucket": "part_a", "id": 1},
{"bucket": "part_a", "id": 2}
]
}
Problem case: RESTful hybrid_search ignores top-level partitionNames.
curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/hybrid_search" \
-H "Authorization: Bearer $ZILLIZ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"collectionName": "rest_manual_partition_repro",
"partitionNames": ["part_a"],
"search": [
{
"data": [[0.10, 0.20, 0.30, 0.40, 0.50]],
"annsField": "vector_a",
"limit": 4
},
{
"data": [[0.50, 0.40, 0.30, 0.20, 0.10]],
"annsField": "vector_b",
"limit": 4
}
],
"rerank": {
"strategy": "rrf",
"params": {"k": 60}
},
"limit": 4,
"outputFields": ["id", "bucket"]
}'
Actual result:
{
"code": 0,
"data": [
{"bucket": "part_a", "id": 1},
{"bucket": "part_a", "id": 2},
{"bucket": "part_b", "id": 4},
{"bucket": "part_b", "id": 3}
]
}
PyMilvus control case:
from pymilvus import AnnSearchRequest, MilvusClient, RRFRanker
client = MilvusClient(uri=ZILLIZ_URI, token=ZILLIZ_TOKEN)
reqs = [
AnnSearchRequest(
data=[[0.10, 0.20, 0.30, 0.40, 0.50]],
anns_field="vector_a",
param={},
limit=10,
),
AnnSearchRequest(
data=[[0.50, 0.40, 0.30, 0.20, 0.10]],
anns_field="vector_b",
param={},
limit=10,
),
]
res = client.hybrid_search(
collection_name="pymilvus_manual_partition_matrix_1780913727",
reqs=reqs,
ranker=RRFRanker(),
limit=10,
output_fields=["id", "bucket", "color"],
partition_names=["part_a"],
)
Actual PyMilvus result only included part_a rows:
[
[
{"id": 1, "bucket": "part_a", "color": "part_a"},
{"id": 2, "bucket": "part_a", "color": "part_a"}
]
]
Milvus Log
No error is returned by RESTful hybrid_search; the API returns code: 0 with incorrect rows from partitions outside partitionNames.
Anything else?
The documented RESTful form appears to be top-level partitionNames, not partitionName, and not inside each item of the search array.
I also tried undocumented variants for cross-checking:
- top-level
partitionName: "part_a"
partitionNames inside each item of the search array
partitionName inside each item of the search array
All of these also returned code: 0 and searched the whole collection.
Environment
FloatVectorfields,vector_aandvector_bpart_a,part_bCurrent Behavior
RESTful
hybrid_searchignores the documented top-levelpartitionNamesparameter.In the same collection:
searchwith top-levelpartitionNames: ["part_a"]returns only rows frompart_a.hybrid_search(..., partition_names=["part_a"])returns only rows frompart_a.hybrid_searchwith top-levelpartitionNames: ["part_a"]returns rows from bothpart_aandpart_b.Observed RESTful
hybrid_searchresponse:{ "code": 0, "data": [ {"bucket": "part_a", "id": 1}, {"bucket": "part_a", "id": 2}, {"bucket": "part_b", "id": 4}, {"bucket": "part_b", "id": 3} ] }No error is returned. The search scope is silently widened to the whole collection.
Expected Behavior
RESTful
hybrid_searchshould honor top-levelpartitionNames, consistent with RESTfulsearchand PyMilvushybrid_search.For
partitionNames: ["part_a"], only rows frompart_ashould be returned:{ "code": 0, "data": [ {"bucket": "part_a", "id": 1}, {"bucket": "part_a", "id": 2} ] }Steps To Reproduce
Set the endpoint and token:
Create a collection with two vector fields and no partition key:
Create two partitions:
Insert rows into each partition:
Control case: RESTful single-vector
searchhonors top-levelpartitionNames.Actual result:
{ "code": 0, "data": [ {"bucket": "part_a", "id": 1}, {"bucket": "part_a", "id": 2} ] }Problem case: RESTful
hybrid_searchignores top-levelpartitionNames.Actual result:
{ "code": 0, "data": [ {"bucket": "part_a", "id": 1}, {"bucket": "part_a", "id": 2}, {"bucket": "part_b", "id": 4}, {"bucket": "part_b", "id": 3} ] }PyMilvus control case:
Actual PyMilvus result only included
part_arows:[ [ {"id": 1, "bucket": "part_a", "color": "part_a"}, {"id": 2, "bucket": "part_a", "color": "part_a"} ] ]Milvus Log
No error is returned by RESTful
hybrid_search; the API returnscode: 0with incorrect rows from partitions outsidepartitionNames.Anything else?
The documented RESTful form appears to be top-level
partitionNames, notpartitionName, and not inside each item of thesearcharray.I also tried undocumented variants for cross-checking:
partitionName: "part_a"partitionNamesinside each item of thesearcharraypartitionNameinside each item of thesearcharrayAll of these also returned
code: 0and searched the whole collection.