[Feature Request] Add TurboQuant as a new vector quantization method

### Feature Request: Support for TurboQuant vector quantization

**Description**  
Google Research recently introduced **TurboQuant** (to be presented at ICLR 2026), a new **data-oblivious online vector quantization** algorithm.  
It would be great if we could add TurboQuant as a new quantizer option in FAISS.

TurboQuant stands out because:
- It achieves **near-optimal distortion rates** for both MSE and inner-product similarity (within a small constant factor of the theoretical lower bound).
- It is completely **data-oblivious** and requires **almost zero preprocessing/indexing time** (no codebook training needed, unlike PQ).
- It delivers higher recall than traditional Product Quantization (PQ) or RaBitQ while keeping indexing overhead close to zero.

This makes it particularly attractive for large-scale vector search and ANN systems.

**Why it fits FAISS perfectly**  
FAISS already supports a rich set of quantizers (ScalarQuantizer, ProductQuantizer, ResidualQuantizer, etc.).  
TurboQuant complements them well by offering excellent accuracy with near-zero build-time overhead, which is a major pain point in billion-scale production vector stores.

**References**  
- Paper: [TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate](https://arxiv.org/abs/2504.19874)  
- Google Research Blog: [Redefining AI efficiency with extreme compression](https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/)  

**Community Discussion**  
- Reddit r/LocalLLaMA thread (very active right after announcement):  
  https://www.reddit.com/r/LocalLLaMA/comments/1s2su28/google_research_turboquant_redefining_ai/  
  (Initial skepticism about end-to-end speed in naive implementations exists, but recent fused kernel experiments and llama.cpp integration efforts show promising improvements.)

**Key Advantages (from the paper and blog)**  
- Significantly better recall@1@k compared to PQ/RaBitQ on benchmarks (e.g., GloVe)  
- Indexing time ≈ 0 (vs. expensive codebook training in PQ)  
- Designed for both KV-cache compression and vector search / ANN use cases

**Proposed integration** 
We could add a new quantizer class like `faiss::TurboQuant` (or `IndexIVFTurboQuant`).
The paper includes pseudocode, and there are already early community discussions around implementing it (e.g., in [llama.cpp](https://github.com/ggml-org/llama.cpp/discussions/20969)).

**Additional context**  
We are currently using FAISS in production for large-scale vector search. Adding TurboQuant would dramatically improve indexing latency while maintaining or improving search quality.

Happy to discuss further or even help with the implementation if the maintainers are interested!

Thanks in advance! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add TurboQuant as a new vector quantization method #4990

Feature Request: Support for TurboQuant vector quantization

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request] Add TurboQuant as a new vector quantization method #4990

Description

Feature Request: Support for TurboQuant vector quantization

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions