Feature Request: Support for TurboQuant vector quantization
Description
Google Research recently introduced TurboQuant (to be presented at ICLR 2026), a new data-oblivious online vector quantization algorithm.
It would be great if we could add TurboQuant as a new quantizer option in FAISS.
TurboQuant stands out because:
- It achieves near-optimal distortion rates for both MSE and inner-product similarity (within a small constant factor of the theoretical lower bound).
- It is completely data-oblivious and requires almost zero preprocessing/indexing time (no codebook training needed, unlike PQ).
- It delivers higher recall than traditional Product Quantization (PQ) or RaBitQ while keeping indexing overhead close to zero.
This makes it particularly attractive for large-scale vector search and ANN systems.
Why it fits FAISS perfectly
FAISS already supports a rich set of quantizers (ScalarQuantizer, ProductQuantizer, ResidualQuantizer, etc.).
TurboQuant complements them well by offering excellent accuracy with near-zero build-time overhead, which is a major pain point in billion-scale production vector stores.
References
Community Discussion
Key Advantages (from the paper and blog)
- Significantly better recall@1@k compared to PQ/RaBitQ on benchmarks (e.g., GloVe)
- Indexing time ≈ 0 (vs. expensive codebook training in PQ)
- Designed for both KV-cache compression and vector search / ANN use cases
Proposed integration
We could add a new quantizer class like faiss::TurboQuant (or IndexIVFTurboQuant).
The paper includes pseudocode, and there are already early community discussions around implementing it (e.g., in llama.cpp).
Additional context
We are currently using FAISS in production for large-scale vector search. Adding TurboQuant would dramatically improve indexing latency while maintaining or improving search quality.
Happy to discuss further or even help with the implementation if the maintainers are interested!
Thanks in advance! 🙏
Feature Request: Support for TurboQuant vector quantization
Description
Google Research recently introduced TurboQuant (to be presented at ICLR 2026), a new data-oblivious online vector quantization algorithm.
It would be great if we could add TurboQuant as a new quantizer option in FAISS.
TurboQuant stands out because:
This makes it particularly attractive for large-scale vector search and ANN systems.
Why it fits FAISS perfectly
FAISS already supports a rich set of quantizers (ScalarQuantizer, ProductQuantizer, ResidualQuantizer, etc.).
TurboQuant complements them well by offering excellent accuracy with near-zero build-time overhead, which is a major pain point in billion-scale production vector stores.
References
Community Discussion
https://www.reddit.com/r/LocalLLaMA/comments/1s2su28/google_research_turboquant_redefining_ai/
(Initial skepticism about end-to-end speed in naive implementations exists, but recent fused kernel experiments and llama.cpp integration efforts show promising improvements.)
Key Advantages (from the paper and blog)
Proposed integration
We could add a new quantizer class like
faiss::TurboQuant(orIndexIVFTurboQuant).The paper includes pseudocode, and there are already early community discussions around implementing it (e.g., in llama.cpp).
Additional context
We are currently using FAISS in production for large-scale vector search. Adding TurboQuant would dramatically improve indexing latency while maintaining or improving search quality.
Happy to discuss further or even help with the implementation if the maintainers are interested!
Thanks in advance! 🙏