A fixed, reproducible protocol for porting Python packages to Rust crates (rs-<pkg>, exposed to Python via PyO3/maturin), with cryptographic-grade numerical parity against the original Python reference.
This is the Python→Rust sibling of omicverse-rebuildr (which ports R→Python). Same engineering loop, same parity philosophy — the source language is Python and the target language is Rust.
Single-cell genomics, statistical genetics, and adjacent numerical fields have hundreds of canonical algorithms whose only reference implementation is pure Python (often NumPy/SciPy/Numba/Cython): leidenalg, scrublet, harmonypy, fa2, palantir, MAGIC, scanpy kernels, …
When those algorithms become a bottleneck, the options today are bad:
- Numba / Cython the hot loop — helps, but ships a fragile build, doesn't give true multi-core scaling without the GIL dance, and the optimised path silently diverges from the readable reference.
- Rewrite in Rust by hand — fast and safe, but the rewrite usually diverges from the Python reference and the divergence is never measured.
- Use an "approximate" Rust crate — silently a different algorithm with different numerical behaviour.
rebuildpy is the engineering recipe that takes a port from "I want this in Rust" to "the wheel is on PyPI, the crate is on crates.io, and it provably matches the Python reference on the canonical fixture" — in a small number of agent-driven iterations, with the proof of parity shipped alongside the wheel.
Three core ideas:
- The Python source is the executable spec. No reverse-engineering from papers. The agent runs the Python reference on a fixed input and compares its own Rust draft to that output, every iteration.
- Parity is class-aware. "Same output" means different things for an embedding (rotation-invariant), a clustering (label-permutation-invariant), or a pseudotime (correlation-invariant). The protocol pre-registers which numerical metric applies to which output and locks the threshold before any agent code is written.
- Reconstruction is not metric optimization. We never tune the algorithm to "look better" — we tune it to be identical to the Python reference, then search for speed under provably-equivalent rewrites. In Rust the dominant subtlety is that f64 addition is non-associative, so any reordered parallel/SIMD reduction must carry a derived error bound.
What ships at the end of every port:
- A pip-installable wheel on PyPI (maturin-built Rust extension) + optionally the crate on crates.io.
- A
RECONSTRUCTION_REPORT.mdwith full Python-API coverage audit, per-output parity values, two-panel time-vs-accuracy plot, and ecosystem-reuse accounting. - Four pre-executed notebooks: pipeline parity, Python tutorial, Python⇄Rust function dictionary, per-iteration evolution.
- A reproducible parity gate as a pytest test.
# 1. Clone the kit
git clone <your-repo-url> rebuildpy
cd rebuildpy
# 2. Provision the Python reference env (see SETUP.md for full instructions)
conda create -n rebuild-pyref python=3.10 -y
conda activate rebuild-pyref
pip install -r requirements.txt
pip install <the-original-python-package> # the executable spec
# 3. Install the Rust toolchain (NOT a conda package)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo --version && rustc --version
# 4. Provision the Rust target env (maturin builds the extension into it)
conda create -n rebuild-rust python=3.10 -y
conda activate rebuild-rust
pip install -r requirements.txt # includes maturin
# 5. Export the two paths the kit needs
export PYTHON_REF_ENV=$(conda info --envs | awk '/rebuild-pyref/ {print $NF}')
export RUST_TEST_ENV=$(conda info --envs | awk '/rebuild-rust/ {print $NF}')
# 6. Authenticate GitHub CLI (needed for Discovery step)
gh auth login
# 7. Verify the kit installs cleanly (30 seconds)
python -m engine.smoke_test
# Expected: [smoke] OK -- 5/5 checks passed.
# 8. Check if your target Python package is already ported
python -m engine.discover_rust_deps --check <YourPyPackage>If the smoke test passes and discovery says "no existing port", you're ready to start a port — follow PROTOCOL.md.
📖 Full setup walkthrough: SETUP.md (~30 minutes including conda + rustup provisioning).
Point an agent (Claude Code, Cursor, etc.) at this folder and say:
Port Python package X to Rust. Follow rebuildpy/README.md.
The agent will execute the 6-step protocol end-to-end and produce, at the end:
- an
rs-Xrepository (under your$REBUILDPY_ORG) with an installable maturin wheel, - the pre-registered numerical parity gate clearing on the canonical fixture,
- a structured
RECONSTRUCTION_REPORT.md, - four mandatory pre-executed notebooks,
- a PyPI release (and optional crates.io release).
┌─ 0.5 Discovery ─────┐
│ • Is target already │ ← if YES: stop, reuse existing rs- repo
│ ported to Rust? │
│ • Which py deps │ ← matches added as pyproject deps;
│ have rs-mirrors │ others mapped to ndarray/polars/petgraph/linfa
│ or a rust crate? │
└─────────────────────┘
↓
┌─ 1 Shape template ──┐
│ Copy layout from a │
│ prior port matching │
│ the algorithm class │
└─────────────────────┘
↓
┌─ 2 Dual envs ───────┐
│ Python reference env│
│ Rust target env │ (maturin develop --release)
│ Both see same data │
└─────────────────────┘
↓
┌─ 3 Two-agent inner loop ─────────────────────────────────────────────┐
│ │
│ ┌─ Equivalence Agent ────┐ ┌─ Acceleration Agent ──────────────┐ │
│ │ Translate Python → Rust│ → │ Search rewrites for speed; each │ │
│ │ Iterate until parity │ │ requires admissibility proof: │ │
│ │ gate clears (Pearson, │ │ exact / bounded-ε (reduction │ │
│ │ ARI, Procrustes, etc.) │ │ order!) / class-containment. │ │
│ │ │ │ Reject if it breaks parity. │ │
│ └────────────────────────┘ └───────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
↓
┌─ 4 Validate ────────┐
│ Re-confirm gate. │
│ Threshold is read- │
│ only; never widened │
└─────────────────────┘
↓
┌─ 5 Release ─────────┐
│ Publish to PyPI + │
│ crates.io + GitHub. │
│ Become a seed │
│ template. │
└─────────────────────┘
Each step is documented in detail:
| Step | What happens | Document |
|---|---|---|
| 0.5 Discovery | Check whether the target is already a Rust port; check whether each Python dep already has an rs- mirror or a standard Rust crate. STOP if duplicate; reuse deps if found. |
DISCOVERY.md |
| 1 Shape template | Copy directory layout + test scaffold from a prior port. Do NOT copy algorithmic code. | TEMPLATE.md |
| 2 Dual environments | Provision a Python reference env (original package) and a Rust target env (maturin + built extension; cargo from rustup). Both see the same fixture files. | SETUP.md |
| 3 Two-agent inner loop | (a) Equivalence Agent: translate Python → Rust, iterate until the pre-registered class-aware parity gate clears. (b) Acceleration Agent: verifier-guided search over Rust rewrites for speed, each requiring one of three admissibility proofs. | PROTOCOL.md, PARITY_TAXONOMY.md, ACCELERATION_PLAYBOOK.md |
| 4 Validate | Re-confirm the gate. The threshold is committed before agent work begins — never tightened or loosened. | PARITY_TAXONOMY.md |
| 5 Release | Ship the maturin wheel to PyPI + crate to crates.io, publish <org>/rs-X, complete the RECONSTRUCTION_REPORT.md + four mandatory notebooks. |
NOTEBOOKS.md |
Different algorithms have different invariance structures, so "same output" needs different metrics. The protocol pre-registers one class per port output:
| # | Class | Parity criterion | Default threshold | Example Python packages |
|---|---|---|---|---|
| 1 | Deterministic numerical (3 sub-tiers — see PARITY_TAXONOMY.md) | element-wise max_abs_err < tol, optional rtol-scaled |
standard 1e-8 / strict 1e-13 / bounded 1e-6; hard ceiling 1e-6 |
BBKNN distances, MAGIC operator |
| 2 | Stochastic numerical | Kolmogorov–Smirnov ≤ τ or Wasserstein-1 ≤ τ | KS-p ≥ 0.05 | dropout simulations, MCMC draws |
| 3 | Combinatorial clustering | label-invariant: ARI / NMI / Fowlkes–Mallows | ARI ≥ 0.95 | leidenalg, louvain |
| 4 | Continuous embedding | rotation-invariant: Procrustes similarity | Procrustes ≥ 0.95 | UMAP, t-SNE, PCA, harmonypy |
| 5 | Ranked output | top-K Jaccard / Spearman correlation | top-50 Jaccard ≥ 0.8 | HVG selection, DE rankings |
| 6 | Ordinal output (pseudotime) | Pearson / Spearman correlation | Pearson ≥ 0.99 (≥ 1 − 1e-12 treated as exact) |
DPT, palantir |
| 7 | Classification | label agreement / F1 | F1 ≥ 0.95 | scrublet doublet calls |
| 8 | Statistical inference | rank corr on −log10 p + top-K Jaccard | Spearman ≥ 0.90 | diffxpy, scanpy DE |
If the Python function returns multiple outputs of different classes, the manifest declares one gate per output and ALL must pass.
The 8 metric implementations live in engine/parity_metrics.py — import from there rather than redefining. (They are language-agnostic; this is the same module shape omicverse-rebuildr uses.)
📖 Full taxonomy: PARITY_TAXONOMY.md — includes the Python→Rust "when the gate fails: ordered suspicion list" (row-major-vs-transpose, f32-vs-f64, integer wrapping, reduction order, NaN handling, …).
Every rewrite the Acceleration Agent commits must carry one of these proofs:
| Proof class | Meaning | Examples (Rust) |
|---|---|---|
| (E) Exact identity | Bit-equivalent output by mathematical identity or by not touching the arithmetic. | Xᵀ X hoisted out of a loop; Woodbury; zero-copy ArrayView; buffer reuse; fixed-order reduction; LTO + codegen-units=1. |
| (B) Bounded ε-approximation | Error bounded by a closed-form expression; derived in MATH.md, not handwaved. |
Reordered rayon parallel sum / SIMD horizontal-add (`‖Δ‖ ≤ n·eps·max |
| (C) Class-containment theorem | A known theorem guarantees the same output for the relevant input class. | Euclidean MST ⊆ Delaunay (Preparata–Shamos 1985), via spade + petgraph. |
📖 Full catalog: ACCELERATION_PLAYBOOK.md. The headline rule: f64 + is non-associative — reordering any float reduction turns an (E) rewrite into a (B) one that needs a bound. This is the single most common way a Rust port silently breaks deterministic-strict parity.
Traditional evolutionary search plots iteration vs metric because the policy searches for better metric. That's the wrong model here — reconstruction's goal is identical output to the Python reference, not "better" output.
So every port produces two plots against the same iteration axis:
wall-clock (s)
│
│ ●─┐ ← Iteration 0 (straight Rust translation) already
│ │ ●─┐ a big drop vs the Python reference
│ │ ●──●
│ python-ref → iter 0 → iter 1 → iter 2 → iter 3
│
└────────────────────────────────────────────────→ iteration
parity metric (e.g. Procrustes)
│ ●──●──●──●─┐
│ \
│ ●──● ← annotated: "rayon parallel reduction, n·eps·max|x|"
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ threshold (red dashed line)
│
└────────────────────────────────────────────────→ iteration
- Plot 1 (top, log scale): wall-clock should monotonically decrease as rewrites land. Error bars = stddev over 3 warmup-excluded runs. Always a
--releasebuild. - Plot 2 (bottom): parity metric should stay flat at the ceiling. Every dip must be annotated with the math approximation that caused it (almost always a reordered reduction).
Wall-clock measurement rules:
- Warmup: discard the first run (BLAS thread spin-up, extension load, page cache).
- 3 measured runs; report mean ± stddev.
- CV > 10% → auto-extend to 5 runs, report median + IQR.
- Fix BLAS + rayon threads via
OMP_NUM_THREADS=8/RAYON_NUM_THREADS=8before any imports. - Never time a debug build — it is an invalid measurement.
📖 Full spec + iteration-log schema: EVALUATION.md.
A finished port serves five audiences, each with a different need:
| Audience | What they need | Where they look |
|---|---|---|
| Reviewer / scientist evaluating whether to trust the port | Pipeline-level proof Rust ≡ Python numerically | compare_Python_vs_Rust.ipynb |
| End user of the package | A copy-pastable Python tour of every public function (now Rust-backed) | tutorial_<dataset>.ipynb |
| Python user porting their existing code | A function-level dictionary — every Python parameter ↔ Rust parameter, side-by-side calls on identical input | function_by_function_Python_parity.ipynb |
| Auditor of the engineering process asking "did the agent really iterate?" | A per-iteration narrative log with one named subplot per iteration | evolution.ipynb |
| CI / automation | The pre-registered parity gate as a pytest assertion | tests/test_exact_match.py |
All four notebooks ship pre-executed so GitHub renders them without re-running. Phase 4 blocks the port from being released if any one is missing.
evolution.ipynb is a forcing function. It is structured as ## Iteration N — <title> headers, one per iteration, each with a markdown narrative of what changed AND a code cell that produces a subplot for that iteration. If the agent skipped the acceleration loop, the notebook still has the baseline block (## Iteration 0 — Baseline translation) — but the protocol then audits whether obvious acceleration opportunities were missed. The summary 2-panel examples/evolution.png (auto-generated by engine.plot_evolution) supplements but does not replace this notebook.
📖 Schemas + section-by-section requirements: NOTEBOOKS.md.
| File | What it does |
|---|---|
| SETUP.md | First-time install — prerequisites, dual-env provisioning, rustup, env vars, gh auth, smoke test, troubleshooting. |
| PROTOCOL.md | The 6-step protocol + the two-agent inner loop. Read this in a session before starting a port. |
| DISCOVERY.md | Phase 0.5 — reuse before rebuild. Find existing rs- ports for the target and its Python deps. |
| PARITY_TAXONOMY.md | 8-class algorithm taxonomy → which numerical-parity metric applies (+ the reduction-order rule). |
| ACCELERATION_PLAYBOOK.md | Catalog of Rust rewrites with the 3 admissibility proof types. |
| EVALUATION.md | Two-plot evaluation (time vs iter + accuracy vs iter), warmup excluded, accuracy dips annotated. |
| NOTEBOOKS.md | Four mandatory pre-executed notebooks per release. Non-skippable in Phase 4. |
| TEMPLATE.md | Standard rs-<pkg> repo layout + naming conventions + license decision matrix. |
| CHECKLIST.md | Per-port checklist to tick through, Phase 0–5. |
| File | What it does | Typical invocation |
|---|---|---|
smoke_test.py |
30-second sanity check — verifies the kit installs and all 8 parity metrics + audit / plot / benchmark / loop helpers work. | python -m engine.smoke_test |
discover_rust_deps.py |
Lists existing org repos via gh repo list <org> (default omicverse, override with REBUILDPY_ORG); parses the package's pyproject.toml; reports which deps already have rs- mirrors and the Rust-ecosystem crate for the rest. Cached 24h. |
python -m engine.discover_rust_deps --check <PyPkg> |
parity_metrics.py |
The 8 parity-class metric functions (Pearson, ARI, Procrustes, KS, top-K Jaccard, …) + class dispatcher. | from parity_metrics import compute_parity, is_pass |
benchmark.py |
Wall-clock timer with warmup-exclusion + 3-run averaging; pins BLAS + rayon threads; auto-extends to 5 runs + median when CV > 10%. | from benchmark import time_callable |
py_function_audit.py |
Parses the Python package's public API (__all__ / def / class) via ast, audits Rust-crate coverage (#[pyfunction] / #[pymethods] / pub fn), produces AUDIT.md. |
python -m engine.py_function_audit --py-source <pkg>-ref --rust-crate src/ |
plot_evolution.py |
Renders the two-panel evolution PNG from ITERATION_LOG.md, annotates accuracy dips with their math reason. |
python -m engine.plot_evolution --port-dir <path> |
loop.py |
The rebuildpy loop as runnable code — equivalence + acceleration phases as Python callables. | python -m engine.loop --port-dir <path> --phase equivalence |
manifest.template.yaml |
Pre-registered parity gate spec — copy into each new port's data/manifest.yaml. |
(file template) |
Every new port copies these as starting scaffolding; nothing is generated from scratch.
| Template | Becomes |
|---|---|
pyproject.template.toml |
The port's pyproject.toml (maturin build-backend + metadata) |
Cargo.template.toml |
The port's Cargo.toml (crate deps + release profile) |
lib.template.rs |
The port's src/lib.rs (PyO3 module skeleton) |
README.template.md |
The port's user-facing README.md |
py_reference_driver.template.py |
tests/py_reference_driver.py — invokes the Python reference, dumps JSON |
_run_candidate.template.py |
tests/_run_candidate.py — invokes the Rust extension, dumps JSON |
test_exact_match.template.py |
tests/test_exact_match.py — pytest test that asserts the gate |
DISCOVERY.template.md |
The port's DISCOVERY.md artefact (Phase 0.5) |
ITERATION_LOG.template.md |
The port's ITERATION_LOG.md (Phase 3 acceleration log) |
RECONSTRUCTION_REPORT.template.md |
The port's RECONSTRUCTION_REPORT.md (8-section final report) |
MATH.template.md |
The port's MATH.md (perturbation bounds for (B) rewrites) |
compare_Python_vs_Rust.template.ipynb |
Notebook 1 — pipeline parity |
tutorial.template.ipynb |
Notebook 2 — Python tutorial |
function_by_function_Python_parity.template.ipynb |
Notebook 3 — Python⇄Rust function dictionary |
evolution.template.ipynb |
Notebook 4 — per-iteration evolution |
py_per_function_dump.template.py |
Python driver feeding Notebook 3 |
| File | What it does |
|---|---|
| ROADMAP.md | Ranked Python packages awaiting Rust ports, with the rust-ecosystem crates each would lean on. |
| EXAMPLE_WALKTHROUGH.md | End-to-end Phase 0 → Phase 5 narrative for one port, with concrete commands and intermediate outputs. |
A typical agent session opens with:
Port Python package X to Rust. Follow rebuildpy/README.md.
The agent then executes:
- (Phase 0.5 — Discovery) Run
engine/discover_rust_deps.pyto check:- Is
<org>/rs-Xalready published? → if yes, STOP, report the existing repo. - Which of X's Python deps already have
rs-mirrors or a standard Rust crate? → record inDISCOVERY.md.
- Is
- (Phase 0) Look up X's algorithm class in
PARITY_TAXONOMY.md. Write and commitdata/manifest.yamlwith the algorithm class, threshold, canonical fixture path, seed, and per-output gate blocks. The gate is read-only after this. - (Phase 1) Copy the layout from
TEMPLATE.md(seed shape chosen by algorithm class). - (Phase 2 — Equivalence Agent) Translate each Python function in dependency order into Rust;
maturin develop --release; run the per-function parity diff. Iterate until the gate clears at the pre-registered threshold. - (Phase 3 — Acceleration Agent) For each candidate rewrite from
ACCELERATION_PLAYBOOK.md:- Check precondition + produce admissibility proof (E / B / C). For any reordered reduction, derive the
n·eps·max|x|bound inMATH.md. - Apply on a working branch; rebuild; re-run parity test (gate still clearing?); re-benchmark.
- Accept if speedup > 1.05× and gate clears; else roll back.
- Append one YAML block to
ITERATION_LOG.mdper attempt.
- Check precondition + produce admissibility proof (E / B / C). For any reordered reduction, derive the
- (Phase 4 — release artefacts) Tick
CHECKLIST.mdend-to-end; produce all mandatory deliverables:RECONSTRUCTION_REPORT.md(8 sections)MATH.md(perturbation bounds for any (B) rewrites)AUDIT.md(Python-API coverage, auto-generated byengine.py_function_audit)examples/evolution.png(two-panel plot, auto-generated byengine.plot_evolution)examples/compare_Python_vs_Rust.ipynb— pipeline parityexamples/tutorial_<dataset>.ipynb— Python tutorialexamples/function_by_function_Python_parity.ipynb— Python⇄Rust dictionaryexamples/evolution.ipynb— per-iteration narrative + subplot
- (Phase 5 — release)
maturin build --release→ wheel to PyPI;cargo publish→ crates.io; create GitHub repo + release; add the port as a seed template for future ports.
Always-first invariant: Phase 0.5 (Discovery) is non-skippable. If discovery is skipped, the protocol fails — we risk re-implementing a crate that already exists.
No deferred items in Phase 4: every artefact above is mandatory.
Use this kit when:
- ✅ The target is a Python package with a clear numerical output (vector, matrix, table, cluster IDs, p-values) that you want faster and memory-safe.
- ✅ You can construct a canonical input fixture small enough for fast iteration (< 1 minute end-to-end for the Python reference).
- ✅ The upstream Python package is open-source under a license you can match.
- ✅ You're prepared to commit time on the order of 1–5 working days for a clean port.
Don't use it when:
- ❌ The "Python package" you want is closed-source or only described in a paper without runnable code — no executable spec means no parity oracle.
- ❌ The algorithm is dominated by calls into another compiled library (the Python is a thin wrapper) — there's little to gain from a Rust rewrite.
- ❌ You want a Rust algorithm that's better than the Python one, not identical. This kit refuses to widen the gate; fork after the port lands.
- ❌ The hot path is already a well-tuned C/Fortran extension (e.g. pure BLAS) — Rust won't beat it and the parity oracle's ceiling is that extension.
The acceleration loop is verifier-guided test-time search, not weight-update RL — and importantly not metric optimization:
| Component | Mapping |
|---|---|
| Policy | The LLM in-context (no fine-tune, no weight updates). |
| Action | One rewrite drawn from ACCELERATION_PLAYBOOK.md (rayon outer-axis map, zero-copy view, Woodbury, target-cpu=native, MST ⊆ Delaunay, …). |
| Environment | The parity test + a 3-run-mean stopwatch on a --release build over the canonical fixture (see EVALUATION.md). |
| Reward | r_t = φ(a_t) · speedup(a_t) — gate must still clear (φ = 1), then wall-clock speedup ranks admissible candidates. |
| Best-so-far register | The last commit on the in-progress port. Roll back if a later rewrite breaks parity. |
What we don't do: improve the algorithm's metric. Reconstruction's goal is identical outputs to the Python reference, not "better" ones. Two evaluation plots come out of every port:
time vs iteration(monotonically decreasing) andaccuracy vs iteration(flat at the maximum; every dip annotated with the math approximation — almost always a reordered reduction).
No model weights change. Search occurs inside one coding-agent session, with the parity test as oracle and the wall-clock as cost function.
After the parity gate clears and the Acceleration loop terminates, the agent fills out RECONSTRUCTION_REPORT.md. The 8 sections:
- Identity — package, upstream version, algorithm class, threshold, final parity value, audit class A/B/C, LOC, speedup vs Python.
- Python API coverage audit — every public name from the package is in the table (ported / skipped with reason). Auto-populated by
engine.py_function_audit. Also lists dependencies reused (ecosystem audit — which Rust crates /rs-mirrors were reused vs re-implemented). - Parity evidence — per-output metric values, per-fixture wall-clock + parity, reproducible reference command.
- Acceleration evidence — two-panel evolution figure embedded, accepted-vs-rejected rewrites with admissibility proofs.
- Code quality audit —
maturin build --release+pip install+pytestgreen + four mandatory notebooks executed + license compatible + version pinned. All non-skippable. - Known limitations — honest list of what the port doesn't do; never used as an excuse to widen the gate.
- Integration — crate/wheel location, public-API exposure, tutorial slot.
- Sign-off — author, date, active time spent, final audit class.
This is what we present as "the port is done".
The protocol is a faithful adaptation of omicverse-rebuildr (R→Python), re-pointed at the Python→Rust direction. The changes that the new direction forces:
| Area | What changed vs the R→Python kit | Why |
|---|---|---|
| Reference / target | Python is the reference (executable spec); Rust is the target. Envs become PYTHON_REF_ENV / RUST_TEST_ENV. |
The fast language is now the target, not the source. |
| Deterministic error sources | "cross-BLAS rounding" is joined by parallel/SIMD float-reduction reordering as the dominant (B) source. | f64 + is non-associative; rayon/SIMD reorder sums. This is the new central admissibility concern. |
| Acceleration playbook | R→Python algebraic rewrites are kept; added Rust-specific §2 memory/ownership, §3 parallelism/SIMD, §4 compiler flags, §5 interpreter-overhead removal. | Rust's speed comes from ownership + parallelism + codegen, not just algebra. |
| Coverage audit | NAMESPACE parsing → Python __all__/ast parsing; coverage checked against #[pyfunction]/pub fn. |
The source is Python; the target is a Rust crate. |
| Discovery | R DESCRIPTION deps → Python pyproject.toml deps; deps mapped to rs- ports and standard crates (ndarray/polars/petgraph/linfa). |
Reuse the Rust ecosystem, not just org mirrors. |
| Build / release | wheel-on-PyPI → maturin wheel on PyPI + crate on crates.io; debug-build timings declared invalid. | Rust has two distribution channels and a release/debug split. |
See examples/ROADMAP.md for the full ranked list.
| Status | Port | Date | Audit | Speedup | Notes |
|---|---|---|---|---|---|
| ⬜ next | rs-leidenalg | — | TBD | TBD | Community detection; petgraph + ARI gate; highest reuse density |
| ⬜ | rs-scrublet | — | TBD | TBD | Doublet detection; classification/F1 gate; rayon per-cell map |
| ⬜ | rs-harmonypy | — | TBD | TBD | Batch integration; embedding/Procrustes gate; ndarray-linalg |
| ⬜ | rs-fa2 | — | TBD | TBD | ForceAtlas2 layout; deterministic-bounded; SIMD axpy |
| ⬜ | rs-palantir | — | TBD | TBD | Pseudotime; ordinal/Pearson gate; Woodbury |
Q: How long does a typical port take? A: Translation-only (class A): 1–3 days. With minor acceleration (class B): 2–5 days. Heavy acceleration with proofs (class C): 1–2 weeks. The Rust baseline translation usually already delivers most of the speedup; acceleration is about the last 2–5×.
Q: Why is parity so much more fragile than R→Python?
A: Because Rust gives you parallelism and SIMD by default, and f64 addition is non-associative. A rayon parallel sum is reproducible but not bit-identical to a serial sum. The protocol handles this with the reduction-order rule: fixed-order reduction = (E) exact; reordered = (B) bounded by n·eps·max|x|, declared in MATH.md. See PARITY_TAXONOMY.md.
Q: maturin/PyO3 or a standalone Rust binary for the candidate?
A: Default to maturin/PyO3 — the deliverable is a pip-installable Rust-backed Python package, and the candidate runner just import rs_<pkg>. A cargo run binary that dumps JSON is an acceptable fallback when the package is CLI-shaped, but you lose the "drop-in faster replacement" property.
Q: What if my target's deps have no Rust crate (e.g. statsmodels)?
A: Either (a) port the specific routine you need into the crate, or (b) keep that step in Python and call back across the boundary, documenting the seam in MATH.md and RECONSTRUCTION_REPORT.md §6. Don't pretend a different routine is equivalent.
Q: My port gets a 1.2× speedup from a rayon reduction but Procrustes drops from 1.0000 to 0.9990 (still above threshold). Accept?
A: Only if it's a (B) rewrite with the perturbation bound derived in MATH.md. A "small" empirical drop with no closed-form bound is a bug, not an optimisation. Reject and either fix the reduction order ((E)) or derive the bound.
Q: Can I publish to a different GitHub org?
A: Yes. Export REBUILDPY_ORG=<your-org> before running engine.discover_rust_deps. The kit pushes nothing automatically — Phase 5's gh repo create, maturin publish, and cargo publish are explicit and you control them.
Q: Does this work on Windows?
A: Tested on Linux; macOS should work (set the Accelerate BLAS backend for ndarray-linalg). Windows requires WSL2 because the kit shells out to bash for some pipe operations.
The kit itself is MIT. Each individual port matches its upstream Python package's license (GPL-3 if upstream is GPL ≥ 2; MIT/Apache-2.0 dual — the Rust convention — if upstream is permissive). See TEMPLATE.md §License decision matrix.
This protocol is a direct adaptation of the omicverse-rebuildr recipe (reference-driven cross-language library synthesis via LLM agents), re-pointed from R→Python to Python→Rust. The reference-driven parity-gate methodology, the 8-class taxonomy, the two-plot evaluation, and the verifier-guided acceleration search are inherited wholesale; the Python→Rust direction adds the reduction-order admissibility rule and the Rust-specific acceleration playbook. Case-study ports live under github.com/<org>/rs-*.