-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: v2.6.8
- Deployment mode(standalone or cluster): Standalone (Docker)
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.6.5
- OS(Ubuntu or CentOS): UbuntuCurrent Behavior
Using the same boolean/JSON filter expression on two collections:
The rebuilt collection from the 5,000-row JSON dump (full_data_from_json) returns 1,842 hits and includes ID 83.
The minimal collection with only the ID 83 row (min_from_json) returns 0 hits.
The ID 83 row in both collections is identical after normalization, so the difference comes from query execution/data distribution, not from the row content.
Changing vector index types (AUTOINDEX, FLAT, HNSW, etc.) yields the same result, ruling out vector index nondeterminism.
Expected Behavior
With logically identical data, the same expression should return consistent results; at least the minimal collection should match the base collection for ID 83.
Steps To Reproduce
Download the attached archive reproduce_pack.zip in "Current Behavior" and extract it to your local environment.
Environment: Milvus at 127.0.0.1:19530; Python + pymilvus.
Place milvus_full_data.json (5,000 rows, includes ID 83).
Run the rebuild/compare script:
python rebuild_and_compare.py \
--json milvus_full_data.json \
--index HNSW --index-params '{"M":16,"efConstruction":200}' \
--min-source base
You can switch --index to AUTOINDEX / FLAT / IVF_FLAT / IVF_SQ8 / IVF_PQ; the discrepancy still reproduces.
Observe output: base hits 1,842; minimal hits 0; the diff list shows ID 83 only in the base results; the row comparison shows ID 83 is identical across both collections (🟢 Rows equal after normalization).Milvus Log
No response
Anything else?
We have attempted to break down the query into simpler parts, but the complexity makes this challenging. We believe deeper code-level analysis may be necessary to understand the root cause. While similar issues may have been reported previously, this logical inconsistency clearly indicates an underlying problem that requires attention.