Skip to content

[Bug]: REST hybrid_search ignores top-level partitionNames #50396

@wyqzos

Description

@wyqzos

Environment

  • Milvus version: Zilliz Cloud Dedicated, REST v2 endpoint
  • Deployment mode: Zilliz Cloud Dedicated
  • MQ type: N/A
  • SDK version: PyMilvus 3.0.0 for the control case
  • OS: macOS client
  • Collection type: regular partition collection, no partition key
  • Vector fields: two dense FloatVector fields, vector_a and vector_b
  • Partitions used in the test: part_a, part_b

Current Behavior

RESTful hybrid_search ignores the documented top-level partitionNames parameter.

In the same collection:

  • RESTful single-vector search with top-level partitionNames: ["part_a"] returns only rows from part_a.
  • PyMilvus hybrid_search(..., partition_names=["part_a"]) returns only rows from part_a.
  • RESTful hybrid_search with top-level partitionNames: ["part_a"] returns rows from both part_a and part_b.

Observed RESTful hybrid_search response:

{
  "code": 0,
  "data": [
    {"bucket": "part_a", "id": 1},
    {"bucket": "part_a", "id": 2},
    {"bucket": "part_b", "id": 4},
    {"bucket": "part_b", "id": 3}
  ]
}

No error is returned. The search scope is silently widened to the whole collection.

Expected Behavior

RESTful hybrid_search should honor top-level partitionNames, consistent with RESTful search and PyMilvus hybrid_search.

For partitionNames: ["part_a"], only rows from part_a should be returned:

{
  "code": 0,
  "data": [
    {"bucket": "part_a", "id": 1},
    {"bucket": "part_a", "id": 2}
  ]
}

Steps To Reproduce

Set the endpoint and token:

export ZILLIZ_URI="https://<cluster-endpoint>"
export ZILLIZ_TOKEN="<token>"

Create a collection with two vector fields and no partition key:

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/collections/create" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collectionName": "rest_manual_partition_repro",
    "schema": {
      "autoId": false,
      "enabledDynamicField": false,
      "fields": [
        {"fieldName": "id", "dataType": "Int64", "isPrimary": true},
        {"fieldName": "vector_a", "dataType": "FloatVector", "elementTypeParams": {"dim": "5"}},
        {"fieldName": "vector_b", "dataType": "FloatVector", "elementTypeParams": {"dim": "5"}},
        {"fieldName": "bucket", "dataType": "VarChar", "elementTypeParams": {"max_length": 64}}
      ]
    },
    "indexParams": [
      {
        "fieldName": "vector_a",
        "indexName": "vector_a",
        "metricType": "COSINE",
        "params": {"index_type": "AUTOINDEX"}
      },
      {
        "fieldName": "vector_b",
        "indexName": "vector_b",
        "metricType": "COSINE",
        "params": {"index_type": "AUTOINDEX"}
      }
    ]
  }'

Create two partitions:

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/partitions/create" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"collectionName": "rest_manual_partition_repro", "partitionName": "part_a"}'

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/partitions/create" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"collectionName": "rest_manual_partition_repro", "partitionName": "part_b"}'

Insert rows into each partition:

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/insert" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collectionName": "rest_manual_partition_repro",
    "partitionName": "part_a",
    "data": [
      {"id": 1, "vector_a": [0.10, 0.20, 0.30, 0.40, 0.50], "vector_b": [0.50, 0.40, 0.30, 0.20, 0.10], "bucket": "part_a"},
      {"id": 2, "vector_a": [0.11, 0.21, 0.31, 0.41, 0.51], "vector_b": [0.51, 0.41, 0.31, 0.21, 0.11], "bucket": "part_a"}
    ]
  }'

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/insert" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collectionName": "rest_manual_partition_repro",
    "partitionName": "part_b",
    "data": [
      {"id": 3, "vector_a": [0.90, 0.80, 0.70, 0.60, 0.50], "vector_b": [0.50, 0.60, 0.70, 0.80, 0.90], "bucket": "part_b"},
      {"id": 4, "vector_a": [0.91, 0.81, 0.71, 0.61, 0.51], "vector_b": [0.51, 0.61, 0.71, 0.81, 0.91], "bucket": "part_b"}
    ]
  }'

Control case: RESTful single-vector search honors top-level partitionNames.

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/search" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collectionName": "rest_manual_partition_repro",
    "partitionNames": ["part_a"],
    "data": [[0.10, 0.20, 0.30, 0.40, 0.50]],
    "annsField": "vector_a",
    "limit": 4,
    "outputFields": ["id", "bucket"]
  }'

Actual result:

{
  "code": 0,
  "data": [
    {"bucket": "part_a", "id": 1},
    {"bucket": "part_a", "id": 2}
  ]
}

Problem case: RESTful hybrid_search ignores top-level partitionNames.

curl -sS --request POST "$ZILLIZ_URI/v2/vectordb/entities/hybrid_search" \
  -H "Authorization: Bearer $ZILLIZ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collectionName": "rest_manual_partition_repro",
    "partitionNames": ["part_a"],
    "search": [
      {
        "data": [[0.10, 0.20, 0.30, 0.40, 0.50]],
        "annsField": "vector_a",
        "limit": 4
      },
      {
        "data": [[0.50, 0.40, 0.30, 0.20, 0.10]],
        "annsField": "vector_b",
        "limit": 4
      }
    ],
    "rerank": {
      "strategy": "rrf",
      "params": {"k": 60}
    },
    "limit": 4,
    "outputFields": ["id", "bucket"]
  }'

Actual result:

{
  "code": 0,
  "data": [
    {"bucket": "part_a", "id": 1},
    {"bucket": "part_a", "id": 2},
    {"bucket": "part_b", "id": 4},
    {"bucket": "part_b", "id": 3}
  ]
}

PyMilvus control case:

from pymilvus import AnnSearchRequest, MilvusClient, RRFRanker

client = MilvusClient(uri=ZILLIZ_URI, token=ZILLIZ_TOKEN)

reqs = [
    AnnSearchRequest(
        data=[[0.10, 0.20, 0.30, 0.40, 0.50]],
        anns_field="vector_a",
        param={},
        limit=10,
    ),
    AnnSearchRequest(
        data=[[0.50, 0.40, 0.30, 0.20, 0.10]],
        anns_field="vector_b",
        param={},
        limit=10,
    ),
]

res = client.hybrid_search(
    collection_name="pymilvus_manual_partition_matrix_1780913727",
    reqs=reqs,
    ranker=RRFRanker(),
    limit=10,
    output_fields=["id", "bucket", "color"],
    partition_names=["part_a"],
)

Actual PyMilvus result only included part_a rows:

[
  [
    {"id": 1, "bucket": "part_a", "color": "part_a"},
    {"id": 2, "bucket": "part_a", "color": "part_a"}
  ]
]

Milvus Log

No error is returned by RESTful hybrid_search; the API returns code: 0 with incorrect rows from partitions outside partitionNames.

Anything else?

The documented RESTful form appears to be top-level partitionNames, not partitionName, and not inside each item of the search array.

I also tried undocumented variants for cross-checking:

  • top-level partitionName: "part_a"
  • partitionNames inside each item of the search array
  • partitionName inside each item of the search array

All of these also returned code: 0 and searched the whole collection.

Metadata

Metadata

Labels

kind/bugIssues or changes related a bugtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions