Add Augmented Core Extraction Algorithm #1404

julianmi · 2025-10-02T15:42:30Z

This PR introduces Augmented Core Extraction (ACE), an approach proposed by @anaruse for building CAGRA indices on very large datasets that exceed GPU memory capacity. ACE enables users to build high-quality approximate nearest neighbor search indices on datasets that would otherwise be impossible to process on a single GPU. The approach uses the host memory if large enough and falls back to the disk if required.

This work is a collaboration: @anaruse, @tfeher, @achirkin, @mfoerste4

Algorithm Description

Dataset Partitioning: The dataset is partitioned using balanced k-means clustering on sampled data. Each vector is assigned to its two closest partition centroids (primary and augmented). The primary partitions are non-overlapping. The augmentation ensures that cross-partition edges are captured in the final graph. Partitions smaller than a minimum threshold are automatically merged with larger partitions to ensure computational efficiency and graph quality. Vectors from small partitions are reassigned to the nearest valid partitions.
Per-Partition Graph Building: For each partition, a sub-index is built independently (regular build_knn_graph() flow) with its primary vectors plus augmented vectors from neighboring partitions.
Graph Combining: The per-partition graphs are combined into a single unified CAGRA index. Merging is not needed since the primary partitions are non-overlapping. The in-memory variant remaps the local partition IDs to global dataset IDs to create a correct index. The disk variant stores the backward index mappings (dataset_mapping.bin), the reordered dataset (reordered_dataset.bin) and the optimized CAGRA graph (cagra_graph.bin) on disk. The index is then incomplete as show by cuvs::neighbors::index::on_disk(). The files are stored in cuvs::neighbors::index::file_directory(). The HNSW index serialization was provided by @mfoerste4 in [WIP] Add disk2disk serialization foe ACE Algorithm #1410, which was merged here. This adds the serialize_to_hnsw() serialization routine that allows combination of dataset, graph, and mapping. The data will be combined on-the-fly while streamed from disk to disk while trying to minimize the required host memory. The host needs enough memory to hold the index though.

Core Components

ace_build(): Main routine which users should call.
ace_get_partition_labels(): Performs balanced k-means clustering to assign each vector to two closest partitions while handling small partition merging.
ace_create_forward_and_backward_lists(): Creates bidirectional ID mappings between original dataset indices and reordered partition-local indices.
ace_set_index_params(): Set the index parameters based on the partition and augmented dataset to ensure an efficient KNN graph building.
ace_gather_partition_dataset(): In-memory only: gather the partition and augmented dataset.
ace_adjust_sub_graph_ids: In-memory only: Adjust ids in sub search graph and store them into the main search graph.
ace_adjust_final_graph_ids: In-memory only: Map graph neighbor IDs from reordered space back to original vector IDs.
ace_reorder_and_store_dataset: Disk only: Reorder the dataset based on partitions and store to disk. Uses write buffers to improve performance.
ace_load_partition_dataset_from_disk: Disk only: Load partition dataset and augmented dataset from disk.
file_descriptor and ace_read_large_file() / ace_write_large_file(): RAII file handle and chunked file I/O operations.
CAGRA index changes: Added on_disk_ flag and file_directory_ to the CAGRA index structure to support disk-backed indices.
CAGRA parameter changes: Added ace_npartitions and ace_build_dir to the CAGRA parameters for users to specify that ACE should be used and which directory should be used if required.

Usage

C++ API

#include <cuvs/neighbors/cagra.hpp>

using namespace cuvs::neighbors;

// Configure index parameters
cagra::index_params params;
params.ace_npartitions = 10;  // Number of partitions (unset or <= 1 to disable ACE)
params.ace_build_dir = "/tmp/ace_build";  // Directory for intermediate files (should be a fast NVMe)
params.graph_degree = 64;
params.intermediate_graph_degree = 128;

// Build ACE index (dataset can be on host memory)
auto dataset = raft::make_host_matrix<float, int64_t>(n_rows, n_cols);
// ... load dataset ...

auto index = cagra::build_ace(res, params, dataset.view(), params.ace_npartitions);

// Search works identically to standard CAGRA if the host has enough memory (index.on_disk() == false)
cagra::search_params search_params;
auto neighbors = raft::make_device_matrix<uint32_t>(res, n_queries, k);
auto distances = raft::make_device_matrix<float>(res, n_queries, k);
cagra::search(res, search_params, index, queries, neighbors.view(), distances.view());

Storage Requirements

cagra_graph.bin: n_rows * graph_degree * sizeof(IdxT)
dataset_mapping.bin: n_rows * sizeof(IdxT)
reordered_dataset.bin: Size of the input dataset
augmented_dataset.bin: Size of the input dataset

copy-pr-bot · 2025-10-02T15:42:46Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot · 2025-10-02T15:44:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

tfeher · 2025-10-06T08:31:50Z

/ok to test 38c03a4

@anaruse

- Adds the out-of-tree ACE method of @anaruse. This assumes graphs smaller than host memory. - Adds disk_enabled` and `graph_build_dir` parameters to select ACE method.

- Use partitions instead of clusters in ACE to distinguish between ACE clusters and regular KNN graph building clusters.

- Introduced dynamic configuration of nprobes and nlists for IVF-PQ based on partition size to improve KNN graph construction. - Added logging for both IVF-PQ and NN-Descent parameters to provide better insights during the graph building process. - Ensured default parameters are set when no specific graph build parameters are provided.

- Added logic to identify and merge small partitions that do not meet the minimum size requirement for stable KNN graph construction.

- Replaced `disk_enabled` and `graph_build_dir` with `ace_npartitions` and `ace_build_dir` in the parameter parsing logic. - Updated function signatures and documentation to clarify the new partitioning approach for ACE builds.

- Introduced new functions for reordering and storing datasets on disk, optimizing for NVMe performance. - Clarified namings.

tfeher

Thanks Julian. for the updates, it is great to have the tests! I have a few comments to the new code.

cpp/src/neighbors/detail/hnsw.hpp

cpp/tests/neighbors/ann_cagra_ace.cuh

examples/cpp/src/cagra_ace_example.cu

- Fixed an assertion to validate the size of data per element in the HNSW serialization function. Moved out of main loop. - Enhanced memory allocation logic in the CAGRA build process, ensuring proper initialization and reducing potential errors. - Optimized and parallelized some parts.

- Implements cuvsCagraIndexIsOnDisk and cuvsCagraIndexGetFileDirectory.

cpp/include/cuvs/neighbors/graph_build_types.hpp

- Add NumPy headers. - Introduced methods to update the dataset and graph from disk files using NumPy header. - Use file I/O instead of mmap in serialize_to_hnswlib_from_disk. - Move buffered_ofstream to file_io.hpp.

…disk

- Renamed to cagra_hnsw_ace_example to clarify that this uses HNSW for searching. - Use HNSW for search on memory path as well.

robertmaynard · 2025-10-30T18:25:24Z

c/include/cuvs/neighbors/cagra.h

+ * @param[in] params cuvsAceParams_t to allocate
+ * @return cuvsError_t
+ */
+cuvsError_t cuvsAceParamsCreate(cuvsAceParams_t* params);


@julianmi It looks like you forgot to push/add the C test

- Replaced ASSERT with RAFT_EXPECTS for better error handling in cuvs_cagra_hnswlib_wrapper.h. - Added a warning log for small dataset sizes in cagra_build.cuh. - Adjusted min_recall value in tests to improve test accuracy.

achirkin

Hi Julian, a few nitpicks on my side

achirkin · 2025-10-31T07:42:37Z

cpp/include/cuvs/neighbors/cagra.hpp

+template <typename T, typename IdxT>
+auto hnsw_to_cagra_params(raft::matrix_extent<int64_t> dataset,


Please remove the unused template parameters.
The function will become a non-template and you'll need to decide in which compilation unit to place it. If #1448 doesn't get merged before this PR, I'd suggest to just copy the relevant bits here (create a separate cagra.cpp unit).

Thanks, @achirkin. I have moved it to cagra.cpp for now. I will reuse the helpers in #1448 once merged.

achirkin · 2025-10-31T07:51:42Z

cpp/include/cuvs/neighbors/cagra.hpp

+    if (last_slash != std::string::npos) {
+      file_directory_ = file_path.substr(0, last_slash);
+    } else {
+      file_directory_ = ".";


This would yield the current working directory, right?
I'm wondering, would it make sense to default to something like std::tmpfile for the file path instead? From past experience, it sometimes a little annoying when cuVS creates some random/index files in the root of the git directory tree (project folder) and one has to clean it up. Also an environment where the working directory is the executable directory and is not writable could be a problem.

Also do we really need to keep the file_directory_ / file_directory() members? I tried to search through the code where is it really needed and couldn't find any. Do we even need file paths, or perhaps we could get away with cuvs::util::file_descriptor handles?

This is the user-provided build_dir parameter as part of the ACE build method. This enables users to pick a fast disk to speed up the build. /tmp might not be the fastest disk available and/or might not have the capacity to hold the temporary files and HNSW index.
file_directory() is used during HNSW index creation. The disk path was added in from_cagra() and the directory is used in the new serialize_to_hnswlib_from_disk() routine.

achirkin · 2025-10-31T08:06:51Z

cpp/src/util/file_io.hpp

+  file_descriptor(const file_descriptor&)            = delete;
+  file_descriptor& operator=(const file_descriptor&) = delete;
+
+  file_descriptor(file_descriptor&& other) noexcept : fd_(other.fd_) { other.fd_ = -1; }


A nitpick

Suggested change

file_descriptor(file_descriptor&& other) noexcept : fd_(other.fd_) { other.fd_ = -1; }

file_descriptor(file_descriptor&& other) noexcept : fd_{std::exchange(other.fd_, -1)} {}

achirkin · 2025-10-31T08:12:11Z

cpp/src/util/file_io.hpp

+  {
+    if (this != &other) {
+      close();
+      fd_       = other.fd_;
+      other.fd_ = -1;
+    }
+    return *this;
+  }


A nitpick:

Suggested change

{

if (this != &other) {

close();

fd_ = other.fd_;

other.fd_ = -1;

}

return *this;

}

{

std::swap(this->fd_, other.fd_);

return *this;

}

Note you don't need to manually close the handle here; the destructor will be called on the moved-from object anyway.

tfeher

I have finished reviewing the PR. I have created Issue #1486 to keep track of additional issues that I feel are out of scope of this PR.

tfeher · 2025-11-02T21:25:19Z

cpp/src/neighbors/detail/hnsw.hpp

+      os.write(reinterpret_cast<const char*>(graph_row), sizeof(IdxT) * graph_degree_int);
+
+      if (odd_graph_degree) {
+        assert(odd_graph_degree == appr_algo->maxM0_ - graph_degree_int);


This assert is only active in debug mode and fails to compile due to -Werror

comparison of integer expressions of different signedness: 'int' and 'size_t'

Pleace replace it with RAFT_EXPECTS

Replaced with RAFT_EXPECTS and added type cast.

tfeher · 2025-11-02T21:29:54Z

cpp/src/neighbors/detail/hnsw.hpp

+  int64_t next_report_offset = d_report_offset;
+  auto start_clock           = std::chrono::system_clock::now();
+
+  assert(appr_algo->size_data_per_element_ ==


This would only be active in debug mode, let's use RAFT_EXPECTS

tfeher · 2025-11-02T21:30:00Z

cpp/src/neighbors/detail/hnsw.hpp

+  size_t bytes_written = 0;
+  float GiB            = 1 << 30;
+  IdxT zero            = 0;
+  assert(appr_algo->size_data_per_element_ ==


use RAFT_EXPECTS

cpp/src/neighbors/detail/cagra/cagra_build.cuh

tfeher · 2025-11-03T00:15:41Z

cpp/src/neighbors/detail/cagra/cagra_build.cuh

      offset(j) = offset(j - 1) + 1;
    }
-    IdxT ofst = initial_graph_size * pow(base, (double)j - small_graph_degree - 1);
+    IdxT ofst = pow((double)(initial_graph_size - 1) / 2, (double)(j + 1) / small_graph_degree);


What is the reason for this change?

@anaruse Could you comment on this please? This change was taken from your proposal.

- We will need to add the final APIs and align tests in the future.

jinsolp

Thanks again for the work @julianmi ! I left a small comment and a question

jinsolp · 2025-11-03T22:40:06Z

c/include/cuvs/neighbors/cagra.h

+   * in KNN graph construction. 100k - 5M vectors per partition is recommended
+   * depending on the available host and GPU memory.
+   */
+  size_t npartitions;


So if we have 1M rows in the dataset, and say we choose each partition to have 200K vectors (from you range of recommended values). Does this mean n_partitions = 1M / 200K = 5? Would be nice if you could add a simple formula to help the user choose npartitions!

jinsolp · 2025-11-03T23:54:10Z

cpp/src/neighbors/detail/cagra/cagra_build.cuh

+    if (use_disk_mode) {
+      // Load partition dataset from disk files
+      ace_load_partition_dataset_from_disk<T, IdxT>(res,
+                                                    build_dir,
+                                                    partition_id,
+                                                    dataset_dim,
+                                                    partition_histogram.view(),
+                                                    core_partition_offsets.view(),
+                                                    augmented_partition_offsets.view(),
+                                                    sub_dataset.view());


One question; if we assume that the dataset can fit on memory anyways, why do we have to partition the data -> store to disk -> and then read per partitions again? can't we gather corresponding vectors from memory on-the-fly per partition if we have the necessary information (partition_histogram, core_partition_offsets, augmented_partition_offsets)?

julianmi requested a review from a team as a code owner October 2, 2025 15:42

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Oct 2, 2025

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Oct 2, 2025

julianmi marked this pull request as draft October 2, 2025 15:42

julianmi changed the base branch from branch-25.10 to branch-25.12 October 2, 2025 15:43

julianmi force-pushed the ace-disk branch from 5a756d4 to 62b6bea Compare October 2, 2025 15:44

tfeher added feature request New feature or request non-breaking Introduces a non-breaking change labels Oct 6, 2025

tfeher self-requested a review October 6, 2025 08:36

mfoerste4 mentioned this pull request Oct 6, 2025

[WIP] Add disk2disk serialization foe ACE Algorithm #1410

Draft

julianmi added 17 commits October 6, 2025 14:48

Integrate @anaruse's ACE method for large graphs

5e11ce6

- Adds the out-of-tree ACE method of @anaruse. This assumes graphs smaller than host memory. - Adds disk_enabled` and `graph_build_dir` parameters to select ACE method.

ACE: Clarify partition naming

2d64ff3

- Use partitions instead of clusters in ACE to distinguish between ACE clusters and regular KNN graph building clusters.

ACE: Implement merging of small partitions

4a779a4

- Added logic to identify and merge small partitions that do not meet the minimum size requirement for stable KNN graph construction.

ACE: Update parameters to clarify ace method usage

9f5d31c

- Replaced `disk_enabled` and `graph_build_dir` with `ace_npartitions` and `ace_build_dir` in the parameter parsing logic. - Updated function signatures and documentation to clarify the new partitioning approach for ACE builds.

ACE: Add timinings

5552873

ACE: Remove unused vector_fwd_list_1 in build_ace

68a0ad8

ACE: Check if we have enough host memory

0e86d18

ACE: Restructure parameter setting

12b0366

ACE: Restructure small partition merging

b0f1d04

ACE: Refactor partition data gathering

e25375f

ACE: Refactor forward backward list creation

434bb4d

ACE: Refactor id adjusting of sub search graph

3b1010f

ACE: Refactor id adjusting of final search graph

31431ba

ACE: Refactor partition label handling and dataset storage

1c53df2

- Introduced new functions for reordering and storing datasets on disk, optimizing for NVMe performance. - Clarified namings.

ACE: Improve file I/O speeds

128031b

ACE: Reduce logging

8efde31

tfeher requested changes Oct 24, 2025

View reviewed changes

cpp/src/neighbors/detail/hnsw.hpp Outdated Show resolved Hide resolved

cpp/src/neighbors/detail/hnsw.hpp Outdated Show resolved Hide resolved

cpp/tests/neighbors/ann_cagra_ace.cuh Show resolved Hide resolved

examples/cpp/src/cagra_ace_example.cu Outdated Show resolved Hide resolved

julianmi mentioned this pull request Oct 24, 2025

[BUG] sample_rows + balanced k-means leads to imbalanced clusters on BIGANN 1B #1461

Open

julianmi added 3 commits October 26, 2025 06:48

ACE: Add missing C interfaces

4af2164

- Implements cuvsCagraIndexIsOnDisk and cuvsCagraIndexGetFileDirectory.

ACE: Use RAFT_EXPECTS instead of ASSERT

865f6e0

tfeher reviewed Oct 27, 2025

View reviewed changes

cpp/include/cuvs/neighbors/graph_build_types.hpp Outdated Show resolved Hide resolved

julianmi and others added 11 commits October 28, 2025 13:20

ACE: Add missing C tests

4f4037c

ACE: Minor logging improvements

05c3409

fix odd graph degree

d409ead

ACE: Switch to NumPy file format

63e6e83

- Add NumPy headers. - Introduced methods to update the dataset and graph from disk files using NumPy header. - Use file I/O instead of mmap in serialize_to_hnswlib_from_disk. - Move buffered_ofstream to file_io.hpp.

Merge remote-tracking branch 'upstream/main' into ace-disk

f3b9286

ACE: Use SPDX licensing

32081df

merge conflicts

01713bf

Merge branch 'ace-disk' of https://github.com/julianmi/cuvs into ace-…

0248a4f

…disk

Merge branch 'main' into ace-disk

fe32571

ACE: Drop ace prefix from parameters

15f547d

ACE: Rename ACE example and use HNSW search

fdaa534

- Renamed to cagra_hnsw_ace_example to clarify that this uses HNSW for searching. - Use HNSW for search on memory path as well.

robertmaynard requested changes Oct 30, 2025

View reviewed changes

julianmi added 2 commits October 30, 2025 20:47

ACE: Minor improvements

7e15b0a

- Replaced ASSERT with RAFT_EXPECTS for better error handling in cuvs_cagra_hnswlib_wrapper.h. - Added a warning log for small dataset sizes in cagra_build.cuh. - Adjusted min_recall value in tests to improve test accuracy.

Merge branch 'main' into ace-disk

8074bfe

robertmaynard approved these changes Oct 30, 2025

View reviewed changes

achirkin requested changes Oct 31, 2025

View reviewed changes

julianmi added 2 commits November 1, 2025 19:17

ACE: Address review comments

1637c25

Merge branch 'main' into ace-disk

5c6a2a8

tfeher mentioned this pull request Nov 3, 2025

[FEA] CAGRA ACE Follow up tracker #1486

Open

9 tasks

tfeher requested changes Nov 3, 2025

View reviewed changes

julianmi added 3 commits November 3, 2025 19:35

ACE: Use RAFT_EXPECTS

14418c1

Merge remote-tracking branch 'upstream/main' into ace-disk

d4d18cc

ACE: Remove outdated Java and Python interfaces

89eac01

- We will need to add the final APIs and align tests in the future.

jinsolp reviewed Nov 3, 2025

View reviewed changes

		template <typename T, typename IdxT>
		auto hnsw_to_cagra_params(raft::matrix_extent<int64_t> dataset,

	file_descriptor(file_descriptor&& other) noexcept : fd_(other.fd_) { other.fd_ = -1; }
	file_descriptor(file_descriptor&& other) noexcept : fd_{std::exchange(other.fd_, -1)} {}

Add Augmented Core Extraction Algorithm #1404

Are you sure you want to change the base?

Add Augmented Core Extraction Algorithm #1404

Uh oh!

Conversation

julianmi commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Algorithm Description

Core Components

Usage

C++ API

Storage Requirements

Uh oh!

copy-pr-bot bot commented Oct 2, 2025

Uh oh!

copy-pr-bot bot commented Oct 2, 2025

Uh oh!

tfeher commented Oct 6, 2025

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jinsolp left a comment

Choose a reason for hiding this comment

Uh oh!

jinsolp Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

julianmi commented Oct 2, 2025 •

edited

Loading

jinsolp Nov 3, 2025 •

edited

Loading