Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors
This repository accompanies the paper "Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors", presenting a comprehensive evaluation of modern FANNS methods under EM, R, and EMIS filtering.
git clone https://github.com/spcl/fanns-benchmark.git
cd fanns-benchmarkOpen experiment_arxiv_dataset.py and edit line 17 to select the desired dataset size:
small(1k database items, 10k queries)medium(100k database items, 10k queries)large(~2.7M database items, 10k queries)
sudo systemctl start dockersudo docker build -t <name_of_image> .sudo docker run -v $(pwd):/workspace/fanns_benchmark -it <name_of_image>-
Run all experiments and all algorithms:
python3 experiment_arxiv_dataset.py
-
Run a specific experiment:
Possible experiments:
arxiv_em,arxiv_r, andarxiv_emispython3 experiment_arxiv_dataset.py <experiment>
-
Run a specific experiment with a specific algorithm:
Possible experiments:
arxiv_em,arxiv_r, andarxiv_emisarxiv_emsupports the algorithmsACORN,CAPS-kmeans,FDANN-stitched,NHQ-kgraph,NHQ-nsw, andUNGarxiv_rsupports the algorithmsACORNandSeRFarxiv_emissupports the algorithmsACORN,FDANN-stitched, andUNGpython3 experiment_arxiv_dataset.py <experiment> <algorithm>
Each run:
- Performs a parameter search (unless cached) and logs the search to
parameters/parameter_search_log_<experiment>_<algorithm>.jsonl. - Caches best parameters in
parameters/parameter_search_cache_<experiment>_<algorithm>.jsonl. - Stores benchmark results in
results/<experiment>_<algorithm>.json.
python3 plots.pyThis script reads all available result files in the results/ folder and saves visualizations in the plots/ folder.
To run experiments on a compute cluster, the setup must be adapted to the specific environment. In general, the Docker image needs to be integrated with the cluster’s container system (e.g., SARUS), and the experiments are executed the same way as in the local setup.
We provide an example job script:
job_experiment_arxiv_dataset.sh
This script is tailored for a SLURM-based cluster using the SARUS container runtime.
Make sure to adjust the script to your cluster, including but not limited to:
- Partition name (
#SBATCH --partition=..., line 23) - Node exclusions (
#SBATCH --exclude=..., line 24) - Code directory (
CODEDIR=..., line 31) - Cache directory (
CACHEDIR=..., line 34) - Dataset directory (
DATASETDIR=..., line 36)
Each experiment–algorithm pair can be launched individually using:
EXPERIMENT=<experiment> ALGORITHM=<algorithm> ./job_experiment_arxiv_dataset.shRefer to Section 6 from the local setup for valid <experiment> and <algorithm> values.
Logs, parameter caches, and results are stored in the same folders (parameters/, results/) as in the local setup. Result plots can be generated in the same way using:
python3 plots.py-
Medium-scale dataset:
On a consumer-grade laptop with 4 physical cores (8 threads), running all experiments with all algorithms takes approximately 48 hours sequentially. -
Large-scale dataset:
On a compute node with 36 physical cores (72 threads), running the full benchmark sequentially takes roughly 2000 hours.
This can be parallelized across experiments and algorithms if multiple nodes are available.
We provide the results of our experiments on the medium-scale and large-scale datasets, including:
-
Best parameters found:
parameters_medium/,parameters_large/ -
Benchmark results:
results_medium/,results_large/ -
Generated plots:
plots_medium/,plots_large/
If you find this repository useful, please consider citing our FANNS benchmarking paper:
@misc{iff2025fannsbenchmark,
title={Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors},
author={Patrick Iff and Paul Bruegger and Marcin Chrapek and Maciej Besta and Torsten Hoefler},
year={2025},
eprint={2507.21989},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2507.21989},
}