Skip to content

Commit e7c13a5

Browse files
authored
Update introducing-aisaq-in-milvus-billion-scale-vector-search-got-3200-cheaper-on-memory.md
1 parent 26fc0e9 commit e7c13a5

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

blog/en/introducing-aisaq-in-milvus-billion-scale-vector-search-got-3200-cheaper-on-memory.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Vector databases have become core infrastructure for mission-critical AI systems
2020

2121
To deliver fast search, most vector databases keep key indexing structures in DRAM (Dynamic Random Access Memory), the fastest and most expensive tier of memory. This design is effective for performance, but it scales poorly. DRAM usage scales with data size rather than query traffic, and even with compression or partial SSD offloading, large portions of the index must remain in memory. As datasets grow, memory costs quickly become a limiting factor.
2222

23-
Milvus already supports **DISKANN**, a disk-based ANN approach that reduces memory pressure by moving much of the index onto SSD. However, DISKANN still relies on DRAM for compressed representations used during search. [Milvus 2.6](https://milvus.io/docs/release_notes.md#v264) takes this further with [AISAQ](https://milvus.io/docs/aisaq.md), a disk-based vector index inspired by [DISKANN](https://milvus.io/docs/diskann.md) that stores all search-critical data on disk. In a billion-vector workload, this reduces memory usage from **32 GB to about 10 MB**—a **3,200× reduction**—while maintaining practical performance.
23+
Milvus already supports **DISKANN**, a disk-based ANN approach that reduces memory pressure by moving much of the index onto SSD. However, DISKANN still relies on DRAM for compressed representations used during search. [Milvus 2.6](https://milvus.io/docs/release_notes.md#v264) takes this further with [AISAQ](https://milvus.io/docs/aisaq.md), a disk-based vector index inspired by [DISKANN](https://milvus.io/docs/diskann.md). Developed by KIOXIA, AiSAQ’s architecture was designed with a “Zero-DRAM-Footprint Architecture”, which stores all search-critical data on disk and optimizes data placement to minimize I/O operations. In a billion-vector workload, this reduces memory usage from **32 GB to about 10 MB**—a **3,200× reduction**—while maintaining practical performance.
2424

2525
In the sections that follow, we explain how graph-based vector search works, where memory costs come from, and how AISAQ reshapes the cost curve for billion-scale vector search.
2626

@@ -141,7 +141,9 @@ To make this work, AISAQ reorganizes node storage so that data needed during gra
141141

142142
![](https://assets.zilliz.com/AISAQ_244e661794.png)
143143

144-
To balance performance and storage efficiency under different workloads, AISAQ provides two disk-based storage modes. These modes differ primarily in how PQ-compressed data is stored and accessed during search.
144+
145+
To address different application requirements, AISAQ provides two disk-based storage modes: Performance and Scale. From a technical perspective, these modes differ primarily in how PQ-compressed data is stored and accessed during search. From an application perspective, these modes address two distinct types of requirements: low-latency requirements, typical of online semantic search and recommendation systems, and ultra-high-scale requirements, typical of RAG.
146+
145147

146148
![](https://assets.zilliz.com/aisaq_vs_diskann_35ebee3c64.png)
147149

@@ -159,8 +161,7 @@ From the perspective of the search algorithm, this closely mirrors DISKANN’s a
159161

160162
The trade-off is storage overhead. Because a neighbor’s PQ data may appear in multiple nodes’ disk pages, this layout introduces redundancy and significantly increases the overall index size.
161163

162-
**Therefore, the AISAQ-Performance mode prioritizes low I/O latency over disk efficiency.**
163-
164+
Therefore, the AISAQ-Performance mode prioritizes low I/O latency over disk efficiency. From an application perspective, AiSAQ-Performance mode can deliver latency in the 10 mSec range, as required for online semantic search.
164165

165166
### AISAQ-scale: Optimized for Storage Efficiency
166167

@@ -178,10 +179,9 @@ To control this overhead, the AISAQ-Scale mode introduces two additional optimiz
178179

179180
- **PQ data rearrangement**, which orders PQ vectors by access priority to improve locality and reduce random reads.
180181

181-
- A **PQ cache in DRAM** (`pq_cache_size`), which stores frequently accessed PQ data and avoids repeated disk reads for hot entries.
182-
183-
With these optimizations, the AISAQ-Scale mode achieves much better storage efficiency than AISAQ-Performance, while maintaining practical search performance. That performance remains lower than DISKANN or AISAQ-Performance, but the memory footprint is dramatically smaller.
182+
- A **PQ cache in DRAM** (`pq_read_page_cache_size`), which stores frequently accessed PQ data and avoids repeated disk reads for hot entries.
184183

184+
With these optimizations, the AISAQ-Scale mode achieves much better storage efficiency than AISAQ-Performance, while maintaining practical search performance. That performance remains lower than DISKANN, but there is no storage overhead (index size is similar to DISKANN) and the memory footprint is dramatically smaller. From an application perspective, AiSAQ provides the means to meet RAG requirements at ultra-high scale.
185185

186186
### Key Advantages of AISAQ
187187

@@ -279,7 +279,7 @@ These datasets reflect two distinct real-world scenarios: compact vision feature
279279

280280
**Sift128D1M (Full Vector ~488MB)**
281281

282-
![](https://assets.zilliz.com/Sift128_D1_M_706a5b4e23.png)
282+
![](https://assets.zilliz.com/aisaq_53da7b566a.png)
283283

284284
**Cohere768D1M (Full Vector ~2930MB)**
285285

@@ -328,7 +328,7 @@ In practice, users can tune this trade-off by adjusting `INLINE_PQ` and PQ compr
328328

329329
The economics of modern hardware are changing. DRAM prices remain high, while SSD performance has advanced rapidly—PCIe 5.0 drives now deliver bandwidth exceeding **14 GB/s**. As a result, architectures that shift search-critical data from expensive DRAM to far more affordable SSD storage are becoming increasingly compelling. With SSD capacity costing **less than 30 times as much per gigabyte as** DRAM, these differences are no longer marginal—they meaningfully influence system design.
330330

331-
AISAQ reflects this shift. By eliminating the need for large, always-on memory allocations, it enables vector search systems to scale based on data size and workload requirements rather than DRAM limits. This approach aligns with a broader trend toward **“all-in-storage” architectures**, where fast SSDs play a central role not just in persistence, but in active computation and search.
331+
AISAQ reflects this shift. By eliminating the need for large, always-on memory allocations, it enables vector search systems to scale based on data size and workload requirements rather than DRAM limits. This approach aligns with a broader trend toward “all-in-storage” architectures, where fast SSDs play a central role not just in persistence, but in active computation and search. By offering two operating modes – Performance and Scale – AiSAQ meets the requirements of both semantic search (which requires the lowest latency) and RAG (which requires very high scale, but moderate latency).
332332

333333
This shift is unlikely to be confined to vector databases. Similar design patterns are already emerging in graph processing, time-series analytics, and even parts of traditional relational systems, as developers rethink long-standing assumptions about where data must reside to achieve acceptable performance. As hardware economics continue to evolve, system architectures will follow.
334334

@@ -359,4 +359,4 @@ Have questions or want a deep dive on any feature of the latest Milvus? Join our
359359

360360
- [We Replaced Kafka/Pulsar with a Woodpecker for Milvus ](https://milvus.io/blog/we-replaced-kafka-pulsar-with-a-woodpecker-for-milvus.md)
361361

362-
- [Vector Search in the Real World: How to Filter Efficiently Without Killing Recall](https://milvus.io/blog/how-to-filter-efficiently-without-killing-recall.md)
362+
- [Vector Search in the Real World: How to Filter Efficiently Without Killing Recall](https://milvus.io/blog/how-to-filter-efficiently-without-killing-recall.md)

0 commit comments

Comments
 (0)