Proposal: Use a bounded top-K heap in nns_by_leaf instead of full sort

### 🧩 Proposal: Use a bounded top-K heap in `nns_by_leaf` instead of full sort

#### **Why**

`nns_by_leaf` currently collects all candidate neighbors, computes their distances, and performs a full `sort_unstable_by()` before truncating to the requested `count`.
This is **O(N log N)**, even though we only need the **top-K** results.
For large searches (e.g. 100 K+ candidates, K ≈ 100), this adds unnecessary latency and allocations.

#### **What**

Replace the final “collect + sort + truncate” logic with a **bounded max-heap** that only keeps the best K elements as we iterate.
This reduces complexity to **O(N log K)** and limits allocations to `count`.

#### **Expected impact**

* ~20 – 40 % faster query times on large datasets
* Lower peak memory usage
* Smaller CPU and cache footprint

#### **How**

Example sketch (~20 lines):

```rust
use std::collections::BinaryHeap;

fn nns_by_leaf(&self, query: &[f32], count: usize) -> Vec<(f32, usize)> {
    let mut heap = BinaryHeap::with_capacity(count);

    for candidate_id in self.collect_candidates(query) {
        let vector = self.database.get_vector(candidate_id);
        let dist = self.distance(query, &vector);

        if heap.len() < count {
            heap.push((dist, candidate_id));
        } else if dist < heap.peek().unwrap().0 {
            heap.pop();
            heap.push((dist, candidate_id));
        }
    }

    let mut results: Vec<_> = heap.into_vec();
    results.sort_unstable_by(|a, b| a.0.partial_cmp(&b.0).unwrap());
    results
}
```

#### **Notes**

* Drop-in replacement; preserves existing behavior.
* No dependency changes.
* Optional future improvement: use `select_nth_unstable_by()` for large K.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Use a bounded top-K heap in nns_by_leaf instead of full sort #156

🧩 Proposal: Use a bounded top-K heap in `nns_by_leaf` instead of full sort

Why

What

Expected impact

How

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Use a bounded top-K heap in nns_by_leaf instead of full sort #156

Description

🧩 Proposal: Use a bounded top-K heap in nns_by_leaf instead of full sort

Why

What

Expected impact

How

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

🧩 Proposal: Use a bounded top-K heap in `nns_by_leaf` instead of full sort