Skip to content

[Bug]: Query result discrepancy between single-row and multi-row collections due to execution plan differences #46973

@litt1e-c

Description

@litt1e-c

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.6.8
- Deployment mode(standalone or cluster): Standalone (Docker)
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.6.5
- OS(Ubuntu or CentOS): Ubuntu

Current Behavior

Using the same boolean/JSON filter expression on two collections:
The rebuilt collection from the 5,000-row JSON dump (full_data_from_json) returns 1,842 hits and includes ID 83.
The minimal collection with only the ID 83 row (min_from_json) returns 0 hits.
The ID 83 row in both collections is identical after normalization, so the difference comes from query execution/data distribution, not from the row content.
Changing vector index types (AUTOINDEX, FLAT, HNSW, etc.) yields the same result, ruling out vector index nondeterminism.

reproduce_pack.zip

Image

Expected Behavior

With logically identical data, the same expression should return consistent results; at least the minimal collection should match the base collection for ID 83.

Steps To Reproduce

Download the attached archive reproduce_pack.zip in "Current Behavior" and extract it to your local environment.
Environment: Milvus at 127.0.0.1:19530; Python + pymilvus.
Place milvus_full_data.json (5,000 rows, includes ID 83).
Run the rebuild/compare script:
python rebuild_and_compare.py \
  --json milvus_full_data.json \
  --index HNSW --index-params '{"M":16,"efConstruction":200}' \
  --min-source base

You can switch --index to AUTOINDEX / FLAT / IVF_FLAT / IVF_SQ8 / IVF_PQ; the discrepancy still reproduces.
Observe output: base hits 1,842; minimal hits 0; the diff list shows ID 83 only in the base results; the row comparison shows ID 83 is identical across both collections (🟢 Rows equal after normalization).

Milvus Log

No response

Anything else?

We have attempted to break down the query into simpler parts, but the complexity makes this challenging. We believe deeper code-level analysis may be necessary to understand the root cause. While similar issues may have been reported previously, this logical inconsistency clearly indicates an underlying problem that requires attention.

Metadata

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions