Implementation of the algorithm in multiple languages to compare performance in a standard GitHub Actions environment.
Languages currently wired in this repo: Rust, C, C++, Nim, Crystal, Kotlin (JAR), Elixir (.exs), Erlang (.erl).
Environment:
- Host: Linux/6.11.0-1018-azure (x86_64)
- CPU cores: 4
- Git commit: 460fdc9
| impl | lang | graph | n | m | k | B | threads | time_ns | popped | edges_scanned | heap_pushes | B_prime | mem_bytes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| c-bmssp | C | ba | 200000 | 999995 | 16 | 400 | 1 | 1779913 | 8010 | 40045 | 8995 | 400 | 17599920 |
| cpp-bmssp | C++ | ba | 200000 | 999995 | 16 | 400 | 1 | 2254705 | 8374 | 41865 | 9473 | 400 | 17599920 |
| kotlin-bmssp | Kotlin | ba | 200000 | 1999958 | 16 | 400 | 1 | 282058140 | 200000 | 1999958 | 402594 | 400 | 35199328 |
| nim-bmssp | Nim | ba | 200000 | 999995 | 16 | 400 | 1 | 2143144 | 7267 | 36330 | 8252 | 400 | 17599920 |
| rust-bmssp | Rust | ba | 200000 | 999995 | 16 | 400 | 1 | 2461474 | 8331 | 41650 | 9331 | 400 | 22799944 |
| c-bmssp | C | grid | 102400 | 408320 | 16 | 400 | 1 | 732563 | 4878 | 19511 | 5957 | 400 | 7352320 |
| cpp-bmssp | C++ | grid | 102400 | 408320 | 16 | 400 | 1 | 897503 | 4871 | 19455 | 5984 | 400 | 7352320 |
| crystal-bmssp | Crystal | grid | 102400 | 408320 | 16 | 400 | 1 | 66888845 | 4741 | 18945 | 5797 | 400 | 7152652 |
| elixir-bmssp | Elixir | grid | 102400 | 408320 | 16 | 400 | 1 | 431964997 | 4956 | 19775 | 6125 | 400 | 8171520 |
| erlang-bmssp | Erlang | grid | 102400 | 408320 | 16 | 400 | 1 | 11623659 | 4699 | 18772 | 5748 | 400 | 8171520 |
| kotlin-bmssp | Kotlin | grid | 102400 | 408320 | 16 | 400 | 1 | 34700586 | 5306 | 21202 | 6492 | 400 | 8171520 |
| nim-bmssp | Nim | grid | 102400 | 408320 | 16 | 400 | 1 | 1932144 | 5051 | 20204 | 6228 | 400 | 7352320 |
| rust-bmssp | Rust | grid | 102400 | 408320 | 16 | 400 | 1 | 1045159 | 5091 | 20326 | 6280 | 400 | 10014744 |
| impl | lang | graph | n | m | k | B | seed | threads | time_ns | popped | edges_scanned | heap_pushes | B_prime | mem_bytes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rust-bmssp | Rust | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 741251 | 868 | 3423 | 1047 | 50 | 241824 |
| c-bmssp | C | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 99065 | 1289 | 5119 | 1565 | 50 | 176800 |
| cpp-bmssp | C++ | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 117480 | 1064 | 4224 | 1261 | 50 | 176800 |
| kotlin-bmssp | Kotlin | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 5308820 | 1102 | 4386 | 1309 | 50 | 196800 |
| elixir-bmssp | Elixir | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 5410039 | 870 | 3447 | 1047 | 50 | 196800 |
| erlang-bmssp | Erlang | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 1155739 | 691 | 2701 | 818 | 50 | 196800 |
Rust implementation of bounded multi-source shortest paths (multi-source Dijkstra cut off at B).
cargo test
cargo bench -p bmssp
python3 bench/runner.py --release --out results
# fast iteration
python3 bench/runner.py --quick --out results-dev --timeout-seconds 20 --jobs 2
# 1000x larger graphs (heavy):
python3 bench/runner.py --params bench/params_1000x.yaml --release --jobs 4 --timeout-seconds 600 --out results- Linux/macOS:
- Run
scripts/install_deps.sh --yesto auto-install Rust, Python deps, build tools, Crystal/shards, Nim. - Use
--check-onlyto only report missing items.
- Run
- Windows:
- Open PowerShell (as Administrator recommended) and run
scripts/Install-Dependencies.ps1 -Yes. - Add
-CheckOnlyto only report.
- Open PowerShell (as Administrator recommended) and run
Let U = { v | d(v) < B } and E(U) be edges scanned from U.
- Time:
O((|E(U)| + |U|) log |U|)with binary heap; worst-caseO((m+n) log n). - Space:
Θ(n + m)graph +Θ(n)distances/flags + heap bounded by frontier size.
Use Graph::memory_estimate_bytes() to get a byte estimate at runtime.
Here’s the no-BS, self-contained write-up you asked for. It includes the theory, proofs at the right granularity, complexity, comparisons against the usual suspects, and charts (model-based, not empirical—use them to reason about trends, not absolutes).
Given a directed graph
- For every vertex
$v$ with true shortest-path distance$d(v) < B$ , the exact distance$d(v)$ . - The explored set
$U={v\in V\mid d(v)<B}$ . - The tight boundary
$B' = \min{ \hat d(x)\mid x \notin U}$ , i.e., the smallest tentative label never popped (next frontier).
This is Dijkstra from multiple sources that halts when the next tentative key would be
Same invariant as Dijkstra: a node is popped exactly once with its final shortest distance. The only change is the early-exit cut.
Initialize
$\forall v\in V:\ \text{dist}[v] \leftarrow +\infty$ - For each source
$s\in S$ : if$d_0(s) < B$ set$\text{dist}[s]\leftarrow d_0(s)$ and push$(d_0(s),s)$ into a min-PQ. $B' \leftarrow B$
Main loop
-
Pop
$(d,u)$ . If$d \ne \text{dist}[u]$ continue (stale). -
If
$d \ge B$ , set$B'\gets d$ and stop. -
For each edge
$(u,v,w)$ :$\text{nd}=d+w$ .- If
$\text{nd} < \text{dist}[v]$ and$\text{nd} < B$ : relax and push. - Else if
$\text{nd} \ge B$ : update$B' \leftarrow \min(B', \text{nd})$ (tracks the smallest over-bound key observed).
- If
Return
Multi-source is trivial: seed multiple entries. All standard Dijkstra proofs still apply.
Invariant (Dijkstra): When a node
Boundary lemma. Let
Let
-
Time (binary heap):
$$ T = \mathcal{O}\big((|E(U)| + |U|)\log |U|\big) $$
In the worst case (
$B$ large),$|U|=n$ ,$|E(U)|=m$ → standard$\mathcal{O}((m+n)\log n)$ . -
Space: Graph storage
$\Theta(n+m)$ + working arrays$\Theta(n)$ + heap up to$\Theta(|U|)$ . Asymptotically$\Theta(n+m)$ . -
Multi-source impact: Doesn’t change big-O. Practically lowers the radius needed to reach a given
$f = |U|/n$ ; you explore more shallow balls from more centers instead of a deep ball from one center.
Wins
- You only care about a radius (range search, k-hop neighborhoods, geo/proximity queries, limited-distance labeling).
-
Repeated queries that increase
$B$ gradually; reuse$B'$ to chunk work in phases. - Graphs where
$f=|U|/n \ll 1$ for your typical$B$ . Cost drops roughly like$f \log (fn)$ .
Loses
-
$B$ approaches graph diameter ⇒ you’re doing full SSSP; there’s no magic: you pay Dijkstra. - Integer weights with small range: specialized queues (Dial/radix) beat binary heaps.
- Heavy parallel iron: Δ-stepping or GPU SSSP can dominate on large, sparse graphs with good partitioning.
| Method | Weights | Time (typical) | Memory | Pros | Cons | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| BMSSP (this) + binary heap | non-negative | Exact within bound; simple; great when |
If |
||||||||
| Dijkstra (binary heap) | non-negative | Baseline, robust. | Slower on integer weights; no early stop unless you add a bound. | ||||||||
| Dijkstra (Fibonacci) | non-negative |
|
big constants | Better asymptotics. | Not worth it in practice. | ||||||
| Dial / bucket queues | integer weights |
Blazing if |
Explodes with large |
||||||||
| Radix / 0-1 BFS | small integer | near-linear | Ideal for tiny ranges (0-1 BFS). | Narrow use-case. | |||||||
| Δ-stepping (parallel) | non-negative |
|
buckets + frontier | Parallel-friendly; fast in practice. | Needs tuning (δ); irregular work. | ||||||
| Bellman-Ford | any | Negative edges (no cycles). | Too slow; not comparable for your case. | ||||||||
| A* | non-negative + heuristic | like Dijkstra on set actually explored | like Dijkstra | If heuristic is good, dominates. | Needs problem-specific heuristics. |
Bottom line: If you need exact distances up to a bound and you’re not on tiny integer weights, BMSSP (bounded Dijkstra) is the simplest, most dependable hammer. If weights are small integers, buckets win. If you’re on many-core/GPU, consider Δ-stepping or GPU SSSP.
These plots use simple cost models (they are not runtime measurements). They illustrate how the math scales with explored fraction cargo bench to pin real numbers.
-
Relative time vs explored fraction
Bounded Dijkstra cost shrinks roughly like
$f \log (fn)$ vs full Dijkstra at$f=1$ .
-
Bounded Dijkstra vs Dial (integer weights)
When the weight range
$C$ is small, Dial can beat binary heaps; as$B$ grows or$C$ is large, buckets become a tax.
-
Extra memory (excluding adjacency)
All Dijkstra variants pay
$O(n)$ . Bucketed queues add$O(B)$ (or $O(C)$) worth of buckets.
Want empirical plots? Run the provided Rust bench across varying
$B$ ,$k=|S|$ ,$n,m$ , and (optionally) a bucketed queue variant. I can scriptcargo bench+ CSV + gnuplot for you.
Crystal port lives in impls/crystal and follows the same CLI and JSON contract.
Build locally (if Crystal is installed):
cd impls/crystal
shards build --release
./bin/bmssp_cr --graph grid --rows 20 --cols 20 --k 8 --B 50 --seed 1 --trials 2 --jsonThe bench runner will auto-detect Crystal (crystal + shards in PATH) and include its results.
- Keep
distasu64. Usesaturating_addto avoid overflow during relaxations. - Don’t attempt “decrease-key”; just push duplicates and skip stale pops. It’s faster with Rust’s
BinaryHeap. - Track
B'while relaxing edges that would cross the bound—this is your next-phase threshold. - Multi-source is just seeding many
(node, offset)entries. If you repeat with larger$B$ , reusedistas a warm-start and continue; it preserves optimality because labels are monotone. - Memory: adjacency dominates. The rest is a few
Vecs of size$n$ and the heap/frontier.
-
Time:
$\boxed{ \mathcal{O}\big((|E(U)|+|U|)\log |U|\big) }$ Worst case:$\mathcal{O}((m+n)\log n)$ . -
Space:
$\boxed{ \Theta(n+m) }$ total; working state$\Theta(n)$ + heap$O(|U|)$ .
If you want byte-accurate numbers, the Rust crate includes a memory_estimate_bytes() helper; for real allocations, measure with heaptrack/massif or Linux cgroups.
-
Small integer weights (e.g., 0–255): Dial/radix will beat binary heaps. BMSSP still applies—just stop at
$B$ —but use buckets. -
Parallel hardware: Δ-stepping buckets plus bound
$B$ works well; you get level-synchronous waves clipped at$B$ . -
Heuristic-rich domains: A* with an admissible heuristic can explore far less than
$U$ for the same$B$ .
-
Use the Rust crate I gave you (
cargo bench -p bmssp) and vary:-
$n,m$ (grid, ER random, power-law) - number of sources
$k$ - bound
$B$
-
-
Implement a second queue:
- Dial (if your weights are bounded ints)
- or a bucketed Δ-stepping variant (single-thread first)
-
Capture: push counts, relax counts, popped vertices, wall-clock (release), and peak RSS.
-
Plot time vs |U|, time vs B, pushes per edge; the trends will match the charts above, with real constants.
- If
$B$ is small relative to typical distances, BMSSP saves you real work, exactly as the$f\log(fn)$ model predicts. - If weights are tiny integers, use buckets (Dial) and still bound by
$B$ . - Otherwise, bounded Dijkstra with a binary heap is the clean default—simple, exact, and fast enough.
If you want this rewritten around the recursive/pivoted BMSSP from your screenshot (the one with FindPivots, batching, etc.), send the full section of the paper and I’ll extend the Rust crate to that variant and give you measured charts next.