Skip to content

IndexFlatPanorama L2 search returns wrong Top-1 after translating vectors and query #5329

@mmnhgo

Description

@mmnhgo

Current Behavior

When using IndexFlatPanorama with METRIC_L2, adding the same offset vector to all stored vectors and the query can change the Top-1 result from the correct nearest neighbor to a wrong one.

This violates the translation invariance of L2 distance.

For squared L2 distance, if all database vectors and query vectors are translated by the same offset vector, all pairwise differences should remain unchanged:

d(x + o, q + o) = d(x, q)

Therefore, the squared L2 distances and the Top-k ordering should remain unchanged.

However, IndexFlatPanorama returns a different Top-1 after the translation. In the reproduction below, the original Top-1 is id 1, but after adding the same offset to both database vectors and the query, IndexFlatPanorama returns id 0.

This is not caused by approximate search. The reproduction uses IndexFlatPanorama as a flat-search index with a tiny deterministic dataset. It is also not caused by ties: the ground-truth distances are clearly different. In the same transformed float32 input, ordinary IndexFlatL2 still returns the correct Top-1.

Actual output on the affected path:

IndexFlatL2 original: id 1, distance 0.0001000000
IndexFlatL2 translated: id 1, distance 0.0001000507

IndexFlatPanorama original: id 1, distance 0.0001000017
IndexFlatPanorama translated: id 0, distance 0.0

Expected translated Top-1: id 1
Actual Panorama translated Top-1: id 0

For example, before translation:

q = [0.12573022, -0.13210486]

id 0 = [0.14393789, -0.11259123]
id 1 = [0.12316237, -0.14176954]

id 1 is the nearest vector.

After adding the same offset to the query and all database vectors:

offset = [91.70051, 111.15123]

the relative differences should remain unchanged. Therefore, id 1 should still be the nearest vector. Instead, IndexFlatPanorama returns id 0.

Steps to Reproduce

The following script creates two indexes:

one IndexFlatL2 index, used as a reference;
one IndexFlatPanorama index, which shows the wrong Top-1 after translation.

The script searches before and after adding the same offset vector to both the database vectors and the query.

import faiss
import numpy as np

np.set_printoptions(precision=10, suppress=True)

q = np.array([[0.12573022, -0.13210486]], dtype=np.float32)

# id 0 is farther, id 1 is nearest.
xb = np.array(
    [
        [0.14393789, -0.11259123],
        [0.12316237, -0.14176954],
    ],
    dtype=np.float32,
)

offset = np.array([91.70051, 111.15123], dtype=np.float32)


def run_indexflatl2():
    index = faiss.IndexFlatL2(2)
    index.add(xb)
    d0, i0 = index.search(q, 1)

    index2 = faiss.IndexFlatL2(2)
    index2.add(xb + offset)
    d1, i1 = index2.search(q + offset, 1)

    return d0, i0, d1, i1


def run_panorama():
    index = faiss.IndexFlatPanorama(2, faiss.METRIC_L2, 1, 1)
    index.add(xb)
    d0, i0 = index.search(q, 1)

    index2 = faiss.IndexFlatPanorama(2, faiss.METRIC_L2, 1, 1)
    index2.add(xb + offset)
    d1, i1 = index2.search(q + offset, 1)

    return d0, i0, d1, i1


def l2sqr(qv, xbv):
    diff = xbv.astype(np.float64) - qv.astype(np.float64)
    return np.sum(diff * diff, axis=1)


fd0, fi0, fd1, fi1 = run_indexflatl2()
pd0, pi0, pd1, pi1 = run_panorama()

gt_original = l2sqr(q[0], xb)
gt_translated = l2sqr((q + offset)[0], xb + offset)

print("faiss version:", getattr(faiss, "__version__", "unknown"))
print()
print("ground truth original distances:", gt_original)
print("ground truth original order:", np.argsort(gt_original))
print("ground truth translated distances:", gt_translated)
print("ground truth translated order:", np.argsort(gt_translated))

print()
print("IndexFlatL2 original:")
print("labels    =", fi0)
print("distances =", fd0)

print()
print("IndexFlatL2 translated:")
print("labels    =", fi1)
print("distances =", fd1)

print()
print("IndexFlatPanorama original:")
print("labels    =", pi0)
print("distances =", pd0)

print()
print("IndexFlatPanorama translated:")
print("labels    =", pi1)
print("distances =", pd1)

expected = int(fi0[0, 0])
actual = int(pi1[0, 0])

print()
print("Expected translated Top-1:", expected)
print("Actual Panorama translated Top-1:", actual)

if expected != actual:
    print("BUG REPRODUCED")
    print(
        {
            "mr": "VectorTranslationMutator",
            "metric": "L2",
            "index": "IndexFlatPanorama",
            "nlevels": 1,
            "batch_size": 1,
            "offset": offset.tolist(),
        }
    )
else:
    print("No bug reproduced")

Expected Behavior

For squared L2 distance, translating all stored vectors and the query by the same offset should preserve all pairwise differences.

Example:

q = [0.12573022, -0.13210486]

r0 = [0.14393789, -0.11259123]
r1 = [0.12316237, -0.14176954]

offset = [91.70051, 111.15123]

After translation:

q'  = q  + offset
r0' = r0 + offset
r1' = r1 + offset

The differences remain unchanged:

r0' - q' = r0 - q
r1' - q' = r1 - q

Therefore, the squared L2 distances and the Top-1 result should remain unchanged.

Before translation, id 1 is the nearest vector. After translation, id 1 should still be the nearest vector.

Expected translated IndexFlatPanorama Top-1:

id 1

Actual Behavior

IndexFlatPanorama returns id 1 before translation, but id 0 after translation:

IndexFlatPanorama original:
labels    = [[1]]
distances = [[0.0001000017]]

IndexFlatPanorama translated:

labels    = [[0]]
distances = [[0.0]]

This is incorrect because the translation does not change the relative geometry of the vectors. The expected translated Top-1 is still id 1.

In the same transformed float32 input, IndexFlatL2 still returns the correct Top-1:

IndexFlatL2 translated:

labels    = [[1]]
distances = [[0.0001000507]]

This shows that the wrong result is specific to the Panorama flat-search path.

Why this is a bug

IndexFlatPanorama is a flat-search variant, not an approximate or quantized index. For L2 distance, adding the same offset to every database vector and the query should not change the Top-k ordering.

The observed behavior violates this invariant. The final reported L2 distance in IndexFlatPanorama appears to come from a norm-expansion computation of the form:

||x||^2 + ||y||^2 - 2 * <x, y>

After translation, the norm and dot-product terms become large and close to each other. In float32, this can suffer numerical cancellation. In the reproduction above, the translated Panorama distances collapse to 0.0, and the exact Top-1 changes from id 1 to id 0.

Environment

Faiss Python API.
Reproduced with faiss==1.14.2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions