IndexScalarQuantizer(QT_fp16, METRIC_L2) silently overflows finite scaled inputs to infinity and returns invalid Top-k results

## Current Behavior

When using `IndexScalarQuantizer` with `QT_fp16` and `METRIC_L2`, uniformly multiplying all stored vectors and all query vectors by the same positive scalar can change the Top-k result from a valid order to invalid labels and distances.

This violates the positive-scaling invariance of L2 ranking.

For L2 distance, if all database vectors and query vectors are multiplied by the same positive scalar `s > 0`, all squared L2 distances are multiplied by the same positive factor `s^2`:

```text
For squared L2 distance, d(sx, sq) = s^2 * d(x, q).
```

Therefore, the Top-k ordering should remain unchanged.

However, `IndexScalarQuantizer(QT_fp16, METRIC_L2)` silently converts finite float32 input values to half-precision infinity when the values exceed the fp16 representable range. As a result, `search()` returns invalid results: all labels become `-1` and all distances become `FLT_MAX`.

This is not caused by approximate search. The reproduction uses `IndexScalarQuantizer` on CPU with a tiny deterministic dataset. It is also not caused by ties: the ground-truth distances are clearly different before and after scaling.

## Actual output on the affected path:

```text
original gt: [0.01, 0.81, 3.61]
original gt order: [0, 1, 2]

scaled gt: [1.0e10, 8.1e11, 3.61e12]
scaled gt order: [0, 1, 2]

original search:
labels    = [0, 1, 2]
distances = [0.01, 0.81, 3.61]

scaled search:
labels    = [-1, -1, -1]
distances = [3.4028235e+38, 3.4028235e+38, 3.4028235e+38]

scaled reconstruct(0):
[inf, 0]
```

For example, after scaling by `1e6`, the input vectors are still finite float32 values:

```text
id 0 -> [1e6, 0]
id 1 -> [2e6, 0]
id 2 -> [3e6, 0]
query -> [1.1e6, 0]
```

The expected nearest-neighbor order is still `[0, 1, 2]`. Instead, FAISS stores the first vector as `[inf, 0]` and returns no valid neighbors.

## Steps to Reproduce

The following script creates two `IndexScalarQuantizer(QT_fp16, METRIC_L2)` indexes:

1. one with the original vectors;
2. one with all database vectors and query vectors multiplied by the same positive scalar.

The script also computes the ground-truth squared L2 distances over the actual float32 values used by FAISS.

```python
import faiss
import numpy as np


def l2sqr(q, xb):
    diff = xb - q
    return np.sum(diff * diff, axis=1)


def run():
    xb = np.array(
        [
            [1.0, 0.0],
            [2.0, 0.0],
            [3.0, 0.0],
        ],
        dtype=np.float32,
    )

    xq = np.array([[1.1, 0.0]], dtype=np.float32)

    scale = np.float32(1e6)
    xb2 = xb * scale
    xq2 = xq * scale

    assert np.isfinite(xb2).all()
    assert np.isfinite(xq2).all()

    gt1 = l2sqr(xq[0], xb)
    gt2 = l2sqr(xq2[0], xb2)

    idx1 = faiss.IndexScalarQuantizer(
        2, faiss.ScalarQuantizer.QT_fp16, faiss.METRIC_L2
    )
    idx1.add(xb)
    D1, I1 = idx1.search(xq, 3)

    idx2 = faiss.IndexScalarQuantizer(
        2, faiss.ScalarQuantizer.QT_fp16, faiss.METRIC_L2
    )
    idx2.add(xb2)
    D2, I2 = idx2.search(xq2, 3)

    rec0 = np.zeros(2, dtype=np.float32)
    idx2.reconstruct(0, rec0)

    print("faiss version:", faiss.__version__)
    print("original gt:", gt1, "order:", np.argsort(gt1))
    print("scaled gt:  ", gt2, "order:", np.argsort(gt2))

    print()
    print("original search:")
    print("labels    =", I1[0])
    print("distances =", D1[0])

    print()
    print("scaled search:")
    print("labels    =", I2[0])
    print("distances =", D2[0])

    print()
    print("scaled reconstruct(0):", rec0)

    expected_ids = list(I1[0])
    actual_ids = list(I2[0])

    print()
    print("expected search labels after positive scaling =", expected_ids)
    print("actual search labels after positive scaling   =", actual_ids)

    if expected_ids != actual_ids:
        print("BUG REPRODUCED")
    else:
        print("No bug reproduced")


if __name__ == "__main__":
    run()
```

## Expected Behavior

For squared L2 distance, multiplying all stored vectors and all query vectors by the same positive scalar should preserve the Top-k order.

Example:

```text
q = [1.1, 0]
r0 = [1, 0]
r1 = [2, 0]
r2 = [3, 0]

scale = 1e6

q'  = [1.1e6, 0]
r0' = [1e6, 0]
r1' = [2e6, 0]
r2' = [3e6, 0]
```

Before scaling:

```text
||r0 - q||^2 = 0.01
||r1 - q||^2 = 0.81
||r2 - q||^2 = 3.61
```

After scaling:

```text
||r0' - q'||^2 = 1.0e10
||r1' - q'||^2 = 8.1e11
||r2' - q'||^2 = 3.61e12
```

The distances are multiplied by `scale^2`, but the order remains unchanged.

Therefore, the expected Top-k labels should remain:

```text
[0, 1, 2]
```

The stored vectors are finite float32 values, so `reconstruct(0)` should not return infinity.

## Actual Behavior

After multiplying all database vectors and query vectors by `1e6`, `IndexScalarQuantizer(QT_fp16, METRIC_L2)` returns:

```text
scaled search:
labels    = [-1, -1, -1]
distances = [3.4028235e+38, 3.4028235e+38, 3.4028235e+38]
```

This is incorrect because all three database vectors are valid finite float32 inputs, and the expected Top-k order is `[0, 1, 2]`.

In addition, reconstruction shows that finite float32 input was silently encoded as infinity:

```text
scaled reconstruct(0): [inf, 0]
```

This indicates that `QT_fp16` silently overflows finite float32 values to half-precision `inf`, and the later distance computation cannot produce valid neighbors.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexScalarQuantizer(QT_fp16, METRIC_L2) silently overflows finite scaled inputs to infinity and returns invalid Top-k results #5286

Current Behavior

Actual output on the affected path:

Steps to Reproduce

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

IndexScalarQuantizer(QT_fp16, METRIC_L2) silently overflows finite scaled inputs to infinity and returns invalid Top-k results #5286

Description

Current Behavior

Actual output on the affected path:

Steps to Reproduce

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions