Skip to content

Commit 60a7088

Browse files
authored
Fix cuvs_bench pytest pareto assert (#2027)
It is possible that the `throughput.csv` and `latency.csv` files can have fewer rows than `raw.csv`. That is because when we create the pareto csv(s) we drop certain rows if the latency and recall values are "dominated" by another point. For example if the latency and recall is lower than another point, we do not include that in the pareto. So we should relax the assert. ``` for rel_path, expectations in expected_files.items(): file_path = temp_datasets_dir / rel_path assert file_path.exists(), f"Expected file {file_path} does not exist." assert file_path.stat().st_size > 0, ( f"Expected file {file_path} is empty." ) df = pd.read_csv(file_path) actual_header = list(df.columns) actual_rows = len(df) # breakpoint() assert actual_header == expectations["header"], ( f"Wrong header produced in file f{rel_path}" ) > assert actual_rows == expectations["rows"] E assert 1 == 2 tests/test_cli.py:442: AssertionError ``` This scenario is possible for certain hardware (gpu, cuda version) and configs. First datapoint is strictly worse than the second, so it is dropped in the pareto csv. This leads to the assertion error. ``` nprobe=1: Recall=0.1788, items_per_second=1.29669M/s, Latency=77.1218u nprobe=5: Recall=0.3722, items_per_second=1.3073M/s, Latency=76.4963u ``` Authors: - Anupam (https://github.com/aamijar) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #2027
1 parent 7a08cc7 commit 60a7088

1 file changed

Lines changed: 20 additions & 2 deletions

File tree

python/cuvs_bench/cuvs_bench/tests/test_cli.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#
2-
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION.
2+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
33
# SPDX-License-Identifier: Apache-2.0
44
#
55

@@ -439,7 +439,25 @@ def test_run_command_creates_results(temp_datasets_dir: Path):
439439
assert actual_header == expectations["header"], (
440440
f"Wrong header produced in file f{rel_path}"
441441
)
442-
assert actual_rows == expectations["rows"]
442+
is_frontier = rel_path.endswith(("latency.csv", "throughput.csv"))
443+
if is_frontier:
444+
# Frontier files may have fewer rows than the raw results
445+
# because the Pareto frontier drops dominated points.
446+
assert 1 <= actual_rows <= expectations["rows"], (
447+
f"Frontier file {rel_path} has {actual_rows} row(s), "
448+
f"expected between 1 and {expectations['rows']}"
449+
)
450+
if actual_rows < expectations["rows"]:
451+
print(
452+
f"Note: {rel_path} has {actual_rows} row(s), "
453+
f"expected {expectations['rows']} "
454+
f"(Pareto frontier dropped dominated points)"
455+
)
456+
else:
457+
assert actual_rows == expectations["rows"], (
458+
f"Expected {expectations['rows']} rows in {rel_path}, "
459+
f"got {actual_rows}"
460+
)
443461

444462

445463
def test_plot_command_creates_png_files(temp_datasets_dir: Path):

0 commit comments

Comments
 (0)