Skip to content

[Bug]: round_decimal changes FLAT L2 Top-k ordering across segments #50347

@mmnhgo

Description

@mmnhgo

Environment

  • Milvus Version: 3.0
    Image: milvusdb/milvus:3.0-20260516-cf989061-amd64
  • Deployment Mode: standalone
  • MQ: rocksmq
  • SDK: pymilvus v2.6.12
  • OS: Container OS: Ubuntu 22.04.4 LTS
    Host OS: Windows 11 + Docker Desktop / WSL2

Reproduction

Option A: Script

import time
import uuid

from pymilvus import (
    connections,
    utility,
    Collection,
    CollectionSchema,
    FieldSchema,
    DataType,
)


HOST = "localhost"
PORT = "19530"


def unique_name(prefix: str) -> str:
    return f"{prefix}_{uuid.uuid4().hex[:8]}"


def drop_if_exists(name: str):
    if utility.has_collection(name):
        utility.drop_collection(name)


def create_collection(name: str):
    fields = [
        FieldSchema(
            name="id",
            dtype=DataType.INT64,
            is_primary=True,
            auto_id=False,
        ),
        FieldSchema(
            name="vec",
            dtype=DataType.FLOAT_VECTOR,
            dim=2,
        ),
    ]
    schema = CollectionSchema(fields=fields, description="round_decimal L2 FLAT repro")
    col = Collection(name=name, schema=schema)

    index_params = {
        "index_type": "FLAT",
        "metric_type": "L2",
        "params": {},
    }
    col.create_index(field_name="vec", index_params=index_params)
    return col


def insert_two_segments(col: Collection, vectors):
    # Insert and flush separately to force two sealed segments.
    col.insert([[1], [vectors[1]]])
    col.flush()

    col.insert([[2], [vectors[2]]])
    col.flush()

    col.load()
    time.sleep(1)


def search(col: Collection, query, round_decimal):
    result = col.search(
        data=[query],
        anns_field="vec",
        param={"metric_type": "L2", "params": {}},
        limit=2,
        output_fields=["id"],
        round_decimal=round_decimal,
    )

    hits = result[0]
    return [(hit.id, hit.distance) for hit in hits]


def run_case(name: str, vectors, query):
    drop_if_exists(name)
    col = create_collection(name)

    try:
        insert_two_segments(col, vectors)

        exact = search(col, query, round_decimal=-1)
        rounded = search(col, query, round_decimal=0)

        print(f"{name}_exact:", exact)
        print(f"{name}_round_decimal_0:", rounded)

        return exact, rounded
    finally:
        col.release()
        drop_if_exists(name)


def main():
    connections.connect(alias="default", host=HOST, port=PORT)

    base_vectors = {
        1: [0.7, 0.0],  # squared L2 = 0.49
        2: [0.6, 0.0],  # squared L2 = 0.36
    }
    base_query = [0.0, 0.0]

    scaled_vectors = {
        1: [7.0, 0.0],  # squared L2 = 49
        2: [6.0, 0.0],  # squared L2 = 36
    }
    scaled_query = [0.0, 0.0]

    base_name = unique_name("round_decimal_base")
    scaled_name = unique_name("round_decimal_scaled")

    base_exact, base_rounded = run_case(base_name, base_vectors, base_query)
    scaled_exact, scaled_rounded = run_case(scaled_name, scaled_vectors, scaled_query)

    print("base_exact:", base_exact)
    print("scaled_exact:", scaled_exact)
    print("base_round_decimal_0:", base_rounded)
    print("scaled_round_decimal_0:", scaled_rounded)

    base_ids = [x[0] for x in base_rounded]
    scaled_ids = [x[0] for x in scaled_rounded]

    expected_ids = [2, 1]

    if base_ids != expected_ids and scaled_ids == expected_ids:
        print("BUG REPRODUCED")
        print({
            "metric": "L2",
            "index": "FLAT",
            "round_decimal": 0,
            "base_ids": base_ids,
            "scaled_ids": scaled_ids,
            "expected_ids": expected_ids,
        })
    else:
        raise AssertionError(
            f"Bug did not reproduce: base_ids={base_ids}, scaled_ids={scaled_ids}, "
            f"expected={expected_ids}"
        )


if __name__ == "__main__":
    main()

Trigger Conditions

  • Frequency: always

  • First observed after: immediately after running the script

  • Does NOT happen when:

    • round_decimal=-1 is used
    • the rounded distances do not collapse into the same value
    • the two candidates are not reduced across multiple sealed segments

Expected Behavior

round_decimal should only affect the formatting or precision of returned distances. It should not affect Top-k ranking or cross-segment result reduction.

For the original vectors:

  • query: [0, 0]
  • id=1: [0.7, 0], squared L2 distance = 0.49
  • id=2: [0.6, 0], squared L2 distance = 0.36

The expected Top-k IDs are:

[2, 1]

After multiplying all dense vectors and the query by a positive scalar 10:

  • query: [0, 0]
  • id=1: [7, 0], squared L2 distance = 49
  • id=2: [6, 0], squared L2 distance = 36

The expected Top-k IDs should still be:

[2, 1]

Positive scaling multiplies all squared L2 distances by the same positive factor, so it should preserve the relative ordering.

Actual Behavior

With round_decimal=0, the original collection returns the wrong order:

base_exact: [(2, 0.36000001430511475), (1, 0.4899999797344208)]
scaled_exact: [(2, 36.0), (1, 49.0)]
base_round_decimal_0: [(1, 0.0), (2, 0.0)]
scaled_round_decimal_0: [(2, 36.0), (1, 49.0)]
BUG REPRODUCED

The original true distances are strictly ordered:

id=2: 0.36
id=1: 0.49

However, with round_decimal=0, both distances are rounded to 0.0 before global reduction. The reducer then treats them as a tie and returns the smaller primary key first, producing:

[1, 2]

After positive scaling, the distances become 36.0 and 49.0, so the rounded distances no longer collapse into the same value, and the result becomes:

[2, 1]

Therefore, Top-k IDs change under positive scaling:

Original: [1, 2]
Scaled:   [2, 1]
Expected: [2, 1] for both

Error Logs

No error in logs, results are incorrect.

No error is raised. This is a silent wrong-result issue.

Non-default Configuration

No relevant non-default Milvus server configuration.

# No relevant non-default configuration.

Analysis Hints (Optional)

This looks like round_decimal is applied too early, before cross-segment result reduction. As a result, presentation rounding affects ranking.

Suspicious code locations:

internal/core/src/query/SearchOnSealed.cpp
Function: SearchOnSealedIndex
Reason: per-segment distances appear to be rounded immediately after index search.

internal/core/src/query/SearchBruteForce.cpp
Function: BruteForceSearch
Reason: sub_result.round_values() is called before chunk/segment merge.

internal/core/src/query/SubSearchResult.cpp
Function: SubSearchResult::merge_impl
Reason: merge compares distance values after they may already have been rounded.

internal/core/src/segcore/ReduceStructure.h
Struct: SearchResultPair
Reason: when rounded distances become equal, tie-breaking by primary key can change the final Top-k order.

Why this is a bug:

round_decimal should only control the precision of returned distances. It should not change ranking semantics. In this repro, true squared L2 distances 0.36 and 0.49 are strictly ordered, but both are rounded to 0.0 before reduction. This creates an artificial tie and causes Milvus to return id=1 before the true nearest neighbor id=2.

This also violates the expected positive-scaling invariance of L2 search: scaling all vectors and the query by the same positive scalar should preserve Top-k ordering.

Metadata

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions