Skip to content

Add DSP (Dynamic Superblock Pruning) sparse index with avx 512SIMD optimizations#1471

Open
lyang24 wants to merge 9 commits intozilliztech:sparse_dsp_devfrom
lyang24:dsp2
Open

Add DSP (Dynamic Superblock Pruning) sparse index with avx 512SIMD optimizations#1471
lyang24 wants to merge 9 commits intozilliztech:sparse_dsp_devfrom
lyang24:dsp2

Conversation

@lyang24
Copy link
Contributor

@lyang24 lyang24 commented Feb 23, 2026

  • DSP index with u8/u16 integer pruning and two-level block hierarchy (trade off between 1/4 of the forward index size and slight accuracy)
  • AVX-512 SIMD: gather/scatter IP accumulation, block UB scan, seek (the original paper uses avx2)

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lyang24
To complete the pull request process, please assign alexanderguzhva after the PR has been reviewed.
You can assign the PR to them by writing /assign @alexanderguzhva in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sparknack
Copy link
Collaborator

since the index format has been changed. I think we need a new index type called SPARSE_DSP, rather than an algorithm within SPARSE_INVERTED_INDEX.

@lyang24 lyang24 force-pushed the dsp2 branch 4 times, most recently from 3a2d516 to edc6413 Compare March 1, 2026 06:28
@lyang24 lyang24 requested a review from sparknack March 1, 2026 06:51
Signed-off-by: lyang24 <lanqingy93@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lyang24
Copy link
Contributor Author

lyang24 commented Mar 1, 2026

since the index format has been changed. I think we need a new index type called SPARSE_DSP, rather than an algorithm within SPARSE_INVERTED_INDEX.

fixed

@lyang24 lyang24 marked this pull request as ready for review March 1, 2026 07:04
for (int h = 0; h < 4; ++h) {
if (!kth_heaps[h].empty()) {
float kth_f = kth_heaps[h].top();
bm.kth[h] = static_cast<uint8_t>(std::min(255.0f, std::floor(kth_f * inv_max_score)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when filter is used, the documents that get filtered out may well be the top ones, causing the initial threshold to be higher than the actual k-th largest score after filtering.


// ========================================================================
// Forward index (flat layout for cache-friendly scoring)
// ========================================================================
Copy link
Collaborator

@sparknack sparknack Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these vectors should support mmap.
more specifically, these contents could be mapped from the serialized index file.

}
default:
// skip unknown sections
RETURN_IF_ERROR(ReadCustomSection(reader, section_header));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to add some new, specific sections for DSP.

#endif

// Write custom section data (e.g., DSP metadata)
WriteCustomSections(writer);
Copy link
Collaborator

@sparknack sparknack Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should SPARSE_DSP have dependency on SPARSE_INVERTED_INDEX?
maybe we can skip serializing the inverted index section?

KNOWHERE_SIMPLE_REGISTER_SPARSE_FLOAT_GLOBAL(SPARSE_WAND_CC_DEPRECATED, SparseInvertedIndexNodeCC,
knowhere::feature::MMAP,
/*use_wand=*/true)
KNOWHERE_SIMPLE_REGISTER_SPARSE_FLOAT_GLOBAL(SPARSE_DSP, SparseDspIndexNode, knowhere::feature::MMAP)
Copy link
Collaborator

@sparknack sparknack Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cardinal does not support this index type, to avoid compatibility issue, remove it for now.

/*use_wand=*/false)
KNOWHERE_SIMPLE_REGISTER_SPARSE_FLOAT_GLOBAL(SPARSE_WAND_CC, SparseInvertedIndexNodeCC, knowhere::feature::MMAP,
/*use_wand=*/true)
KNOWHERE_SIMPLE_REGISTER_SPARSE_FLOAT_GLOBAL(SPARSE_DSP, SparseDspIndexNode, knowhere::feature::MMAP)
Copy link
Collaborator

@sparknack sparknack Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARSE_DSP_CC in concurrent scenarios should be supported.

}
if (block_buf.empty())
continue;
std::sort(block_buf.begin(), block_buf.end(), [](const BlockEntry& a, const BlockEntry& b) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a duplicate sort with Pass 1?

}
}
// Add top-2 non-surviving superblocks as safety net
for (int i = 0; i < 2; ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid magic number

lyang24 and others added 8 commits March 14, 2026 13:16
- Remove unused dim_max_score_ratio from DSP config
- Apply dsp_eta at subblock BoundSum level (paper Section 3)
- Add dsp_gamma configurable top-γ superblock safety net
- Legacy top-2 fallback preserved when gamma=0
- Add benchmark_sparse_dsp with param sweep, latency percentiles,
  coverage metrics (failed queries, avg fill)
- Strengthen DSP unit tests with comparative recall assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wire print_failed_diag after baseline and DSP safe runs
- Print diagnostics for any config with failed queries
- Finer eta sweep: 0.98, 0.95, 0.92, 0.90, 0.88, 0.85, 0.82, 0.80
- Broader gamma sweep under mu=0.3: 50, 100, 250, 500, 1000

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 11 persistent DSP failures (even at mu=1, eta=1) are likely caused
by kth-score initialization setting a nonzero threshold before any
documents are scored, pruning superblocks/blocks prematurely.

Add dsp_kth_init config (default true) to allow disabling this
heuristic for diagnosis. When false, threshold starts at 0 and only
rises after the heap fills — truly safe mode.

Also clean up unused test lambdas from previous refactor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After Fix 2 moved subblock pruning to u16_block_threshold, u16_threshold
was only written but never read. Superblock pruning uses float_threshold
directly via mu_threshold/eta_threshold.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- dsp_kth_alpha: scale factor for kth threshold seed (0.0-1.0)
  Allows separating DSP-T (threshold seeding) from DSP-H (hierarchy)
- Alpha sweep: 0.25, 0.50, 0.75 to find calibration sweet spot
- Rename "DSP safe" -> "DSP default" (kth-init ON is not truly safe)
- "DSP-H exact" for hierarchy-only mode (kth-init OFF)
- Trimmed param sweep for faster turnaround

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mode-driven superblock selection replaces the previous mixed logic:
- legacy (0): dual-threshold + top-2/gamma backstop (unchanged)
- dsp (1): ub>theta/mu || asc>theta/eta, no backstop
- lsp0 (2): top-gamma from ub>=theta, no mu/asc gate
- lsp1 (3): lsp0 safe set + mu gate (ub>theta/mu)
- lsp2 (4): lsp1 + asc gate (ub>theta/mu || asc>theta/eta)

LSP modes with gamma<=0 fall back to legacy (documented in config).
Legacy gamma backstop preserves strict ub>0 inequality.
kth-init and kth-alpha remain orthogonal to mode selection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace packed-u4 block-max with chunk-compressed variable bit-width (0..4)
  encoding: 256-entry chunks with per-chunk minimal bit width selection
- Fix AppendCustomSections size overcount for empty dimensions
- Add stride-specific AVX512 kernels for n=32/64 (fully unrolled, no loop counter)
- Add focused unit tests for bit-packing round-trips and chunk compression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants