research(nightly): hybrid sparse-dense search — BM25 + ANN with RRF and RSF (ADR-256) by ruvnet · Pull Request #576 · ruvnet/RuVector

ruvnet · 2026-06-17T07:38:01Z

Summary

Nightly research spike (2026-06-17) implementing hybrid sparse-dense retrieval in a new
standalone crate ruvector-hybrid, with three fusion strategies benchmarked on a 10K-document
128-D corpus.

ADR-256: proposes adding RRF and RSF to ruvector-core::advanced_features::hybrid_search
crates/ruvector-hybrid: working Rust PoC — zero unsafe, WASM-compilable, 19 unit tests
Benchmark (10K docs × 128-D, 500 queries, k=10, Intel Xeon 2.80 GHz, rustc 1.94.1 --release):

Variant	Recall@10	QPS	Memory
Dense flat (exact cosine)	7.5%	371	5,000 KB
BM25 (sparse only)	77.3%	57,174	637 KB
ScoreFusion α=0.7	68.8%	357	5,637 KB
RRF k=60	50.5%	360	5,637 KB
RSF α=0.5	76.6%	360	5,637 KB

Key Findings

RSF (Weaviate-style per-list min-max normalisation) nearly matches pure BM25 (76.6% vs 77.3%) on keyword-dominated tasks while adding semantic coverage.
RRF (rank-only, score-agnostic) is the safer default when signal balance is unknown — appropriate for agentic RAG pipelines.
ScoreFusion with hard-coded α=0.7 (the current ruvector-core default) is worst among hybrids (68.8%) on keyword-heavy workloads.
The existing ruvector-core BM25 re-tokenises doc texts at query time (O(N×|d|)); this crate pre-computes TF at index time (O(|q|×P)), eliminating the regression.

Files Changed

crates/ruvector-hybrid/
  Cargo.toml          — crate manifest (rand dep only)
  src/lib.rs          — SparseSearch, DenseSearch, HybridSearch traits + recall_at_k
  src/bm25.rs         — Robertson BM25, k1=1.2 b=0.75, pre-computed TF postings
  src/dense.rs        — FlatDenseIndex, cosine flat-scan
  src/fusion.rs       — ScoreFusionIndex, RrfHybridIndex, RsfHybridIndex
  src/main.rs         — benchmark binary + 7 acceptance tests

docs/adr/ADR-256-hybrid-sparse-dense-search.md
docs/research/nightly/2026-06-17-hybrid-sparse-dense/README.md
docs/research/nightly/2026-06-17-hybrid-sparse-dense/gist.md
Cargo.toml            — workspace member added

Test Plan

cargo test -p ruvector-hybrid — 19 unit tests pass
cargo run --release -p ruvector-hybrid — 7 acceptance tests pass, real benchmark numbers
cargo build --release -p ruvector-hybrid — clean build, zero warnings

Phase 2 Work (not in this PR)

Add FusionStrategy enum to ruvector-core::advanced_features::hybrid_search
Add search_rrf() and search_rsf() to HybridSearch struct
Fix BM25 re-tokenisation bug in ruvector-core
Add incremental IDF update for streaming inserts

References

ADR-256: docs/adr/ADR-256-hybrid-sparse-dense-search.md
Research: docs/research/nightly/2026-06-17-hybrid-sparse-dense/README.md
Gist: docs/research/nightly/2026-06-17-hybrid-sparse-dense/gist.md
Cormack et al. "Reciprocal rank fusion outperforms Condorcet and individual rank learning methods." CIKM 2009.
Robertson & Zaragoza "The Probabilistic Relevance Framework: BM25 and Beyond." FnTIR 2009.

🤖 Generated with claude-flow

https://claude.ai/code/session_01NFp4fjSarGCp2xpqJtqP2Z

Generated by Claude Code

Three-pass research survey selecting hybrid sparse-dense (BM25 + ANN + RRF/RSF) as nightly topic. Covers SOTA, gap analysis vs. ruvector-core, industry comparison (Qdrant, Weaviate, Milvus, Vespa, LanceDB), practical and exotic applications, deep research notes, benchmark methodology, and full reference list. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01NFp4fjSarGCp2xpqJtqP2Z

New standalone crate implementing three hybrid sparse-dense search strategies: ScoreFusion (backward-compat with ruvector-core), RRF (Cormack 2009, k=60, rank-only), and RSF (Weaviate-style per-list min-max + configurable α). BM25 pre-computes TF at index time (O(|q|×P) query) fixing the re-tokenisation-at-query-time bug in ruvector-core (O(N×|d|)). Benchmark: 10K docs × 128-D, 20 topics, 500 queries, k=10. BM25: 77.3% recall@10, 57,174 QPS RSF: 76.6% recall@10, 360 QPS RRF: 50.5% recall@10, 360 QPS Score: 68.8% recall@10, 357 QPS Dense: 7.5% recall@10, 371 QPS No unsafe code. Compiles to WASM. 19 unit tests. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01NFp4fjSarGCp2xpqJtqP2Z

Architecture Decision Record for adding Reciprocal Rank Fusion and Relative Score Fusion to RuVector's hybrid search infrastructure. Documents: gap in ruvector-core (global normalisation + re-tokenisation bug), industry comparison, benchmark evidence, three-phase implementation plan, failure modes, security considerations, and migration path. Status: proposed. PoC in crates/ruvector-hybrid. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01NFp4fjSarGCp2xpqJtqP2Z

Public technical article covering RRF and RSF hybrid search fusion in Rust. Includes feature comparison table, Mermaid architecture diagram, real benchmark results, comparison with 9 vector databases, 8 practical + 8 exotic applications, deep research notes on BM25 dominance and normalisation theory, usage guide, optimization guide, and roadmap. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_01NFp4fjSarGCp2xpqJtqP2Z

- centres[t] loop index → iter().enumerate() - percentile cast: drop .max(0) (usize is never negative, clippy::unnecessary_min_or_max) - percentile cast: #[allow] remaining cast lints (intentional saturating cast) - print_row: &mut Vec → &mut [_] - fusion.rs: 3.14 → 3.0 (clippy::approx_constant) - cargo fmt on entire crate Co-Authored-By: claude-flow <ruv@ruv.net>

…-06-17-hybrid-sparse-dense # Conflicts: # Cargo.lock

claude and others added 4 commits June 17, 2026 07:37

ruvnet marked this pull request as ready for review June 19, 2026 03:23

ruvnet and others added 2 commits June 18, 2026 23:27

Merge remote-tracking branch 'origin/main' into research/nightly/2026…

34a85e8

…-06-17-hybrid-sparse-dense # Conflicts: # Cargo.lock

ruvnet merged commit e188a61 into main Jun 19, 2026
46 of 52 checks passed

ruvnet deleted the research/nightly/2026-06-17-hybrid-sparse-dense branch June 19, 2026 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(nightly): hybrid sparse-dense search — BM25 + ANN with RRF and RSF (ADR-256)#576

research(nightly): hybrid sparse-dense search — BM25 + ANN with RRF and RSF (ADR-256)#576
ruvnet merged 6 commits into
mainfrom
research/nightly/2026-06-17-hybrid-sparse-dense

ruvnet commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented Jun 17, 2026

Summary

Key Findings

Files Changed

Test Plan

Phase 2 Work (not in this PR)

References

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants