rebuildpy

A fixed, reproducible protocol for porting Python packages to Rust crates (rs-<pkg>, exposed to Python via PyO3/maturin), with cryptographic-grade numerical parity against the original Python reference.

This is the Python→Rust sibling of omicverse-rebuildr (which ports R→Python). Same engineering loop, same parity philosophy — the source language is Python and the target language is Rust.

What this is

Single-cell genomics, statistical genetics, and adjacent numerical fields have hundreds of canonical algorithms whose only reference implementation is pure Python (often NumPy/SciPy/Numba/Cython): leidenalg, scrublet, harmonypy, fa2, palantir, MAGIC, scanpy kernels, …

When those algorithms become a bottleneck, the options today are bad:

Numba / Cython the hot loop — helps, but ships a fragile build, doesn't give true multi-core scaling without the GIL dance, and the optimised path silently diverges from the readable reference.
Rewrite in Rust by hand — fast and safe, but the rewrite usually diverges from the Python reference and the divergence is never measured.
Use an "approximate" Rust crate — silently a different algorithm with different numerical behaviour.

rebuildpy is the engineering recipe that takes a port from "I want this in Rust" to "the wheel is on PyPI, the crate is on crates.io, and it provably matches the Python reference on the canonical fixture" — in a small number of agent-driven iterations, with the proof of parity shipped alongside the wheel.

Three core ideas:

The Python source is the executable spec. No reverse-engineering from papers. The agent runs the Python reference on a fixed input and compares its own Rust draft to that output, every iteration.
Parity is class-aware. "Same output" means different things for an embedding (rotation-invariant), a clustering (label-permutation-invariant), or a pseudotime (correlation-invariant). The protocol pre-registers which numerical metric applies to which output and locks the threshold before any agent code is written.
Reconstruction is not metric optimization. We never tune the algorithm to "look better" — we tune it to be identical to the Python reference, then search for speed under provably-equivalent rewrites. In Rust the dominant subtlety is that f64 addition is non-associative, so any reordered parallel/SIMD reduction must carry a derived error bound.

What ships at the end of every port:

A pip-installable wheel on PyPI (maturin-built Rust extension) + optionally the crate on crates.io.
A RECONSTRUCTION_REPORT.md with full Python-API coverage audit, per-output parity values, two-panel time-vs-accuracy plot, and ecosystem-reuse accounting.
Four pre-executed notebooks: pipeline parity, Python tutorial, Python⇄Rust function dictionary, per-iteration evolution.
A reproducible parity gate as a pytest test.

Quick start

# 1. Clone the kit
git clone <your-repo-url> rebuildpy
cd rebuildpy

# 2. Provision the Python reference env (see SETUP.md for full instructions)
conda create -n rebuild-pyref python=3.10 -y
conda activate rebuild-pyref
pip install -r requirements.txt
pip install <the-original-python-package>      # the executable spec

# 3. Install the Rust toolchain (NOT a conda package)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo --version && rustc --version

# 4. Provision the Rust target env (maturin builds the extension into it)
conda create -n rebuild-rust python=3.10 -y
conda activate rebuild-rust
pip install -r requirements.txt           # includes maturin

# 5. Export the two paths the kit needs
export PYTHON_REF_ENV=$(conda info --envs | awk '/rebuild-pyref/ {print $NF}')
export RUST_TEST_ENV=$(conda info --envs | awk '/rebuild-rust/ {print $NF}')

# 6. Authenticate GitHub CLI (needed for Discovery step)
gh auth login

# 7. Verify the kit installs cleanly (30 seconds)
python -m engine.smoke_test
# Expected: [smoke] OK -- 5/5 checks passed.

# 8. Check if your target Python package is already ported
python -m engine.discover_rust_deps --check <YourPyPackage>

If the smoke test passes and discovery says "no existing port", you're ready to start a port — follow PROTOCOL.md.

📖 Full setup walkthrough: SETUP.md (~30 minutes including conda + rustup provisioning).

How to invoke the protocol in a session

Point an agent (Claude Code, Cursor, etc.) at this folder and say:

Port Python package X to Rust. Follow rebuildpy/README.md.

The agent will execute the 6-step protocol end-to-end and produce, at the end:

an rs-X repository (under your $REBUILDPY_ORG) with an installable maturin wheel,
the pre-registered numerical parity gate clearing on the canonical fixture,
a structured RECONSTRUCTION_REPORT.md,
four mandatory pre-executed notebooks,
a PyPI release (and optional crates.io release).

The protocol — 6 steps

┌─ 0.5 Discovery ─────┐
│ • Is target already │ ← if YES: stop, reuse existing rs- repo
│   ported to Rust?   │
│ • Which py deps      │ ← matches added as pyproject deps;
│   have rs-mirrors    │   others mapped to ndarray/polars/petgraph/linfa
│   or a rust crate?  │
└─────────────────────┘
         ↓
┌─ 1 Shape template ──┐
│ Copy layout from a  │
│ prior port matching │
│ the algorithm class │
└─────────────────────┘
         ↓
┌─ 2 Dual envs ───────┐
│ Python reference env│
│ Rust target env     │  (maturin develop --release)
│ Both see same data  │
└─────────────────────┘
         ↓
┌─ 3 Two-agent inner loop ─────────────────────────────────────────────┐
│                                                                       │
│  ┌─ Equivalence Agent ────┐    ┌─ Acceleration Agent ──────────────┐ │
│  │ Translate Python → Rust│ →  │ Search rewrites for speed; each   │ │
│  │ Iterate until parity   │    │ requires admissibility proof:     │ │
│  │ gate clears (Pearson,  │    │ exact / bounded-ε (reduction      │ │
│  │ ARI, Procrustes, etc.) │    │ order!) / class-containment.      │ │
│  │                        │    │ Reject if it breaks parity.       │ │
│  └────────────────────────┘    └───────────────────────────────────┘ │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘
         ↓
┌─ 4 Validate ────────┐
│ Re-confirm gate.    │
│ Threshold is read-  │
│ only; never widened │
└─────────────────────┘
         ↓
┌─ 5 Release ─────────┐
│ Publish to PyPI +   │
│ crates.io + GitHub. │
│ Become a seed       │
│ template.           │
└─────────────────────┘

Each step is documented in detail:

Step	What happens	Document
0.5 Discovery	Check whether the target is already a Rust port; check whether each Python dep already has an `rs-` mirror or a standard Rust crate. STOP if duplicate; reuse deps if found.	DISCOVERY.md
1 Shape template	Copy directory layout + test scaffold from a prior port. Do NOT copy algorithmic code.	TEMPLATE.md
2 Dual environments	Provision a Python reference env (original package) and a Rust target env (maturin + built extension; cargo from rustup). Both see the same fixture files.	SETUP.md
3 Two-agent inner loop	(a) Equivalence Agent: translate Python → Rust, iterate until the pre-registered class-aware parity gate clears. (b) Acceleration Agent: verifier-guided search over Rust rewrites for speed, each requiring one of three admissibility proofs.	PROTOCOL.md, PARITY_TAXONOMY.md, ACCELERATION_PLAYBOOK.md
4 Validate	Re-confirm the gate. The threshold is committed before agent work begins — never tightened or loosened.	PARITY_TAXONOMY.md
5 Release	Ship the maturin wheel to PyPI + crate to crates.io, publish `<org>/rs-X`, complete the `RECONSTRUCTION_REPORT.md` + four mandatory notebooks.	NOTEBOOKS.md

The 8 algorithm classes (parity taxonomy)

Different algorithms have different invariance structures, so "same output" needs different metrics. The protocol pre-registers one class per port output:

#	Class	Parity criterion	Default threshold	Example Python packages
1	Deterministic numerical (3 sub-tiers — see PARITY_TAXONOMY.md)	element-wise `max_abs_err < tol`, optional `rtol`-scaled	standard `1e-8` / strict `1e-13` / bounded `1e-6`; hard ceiling `1e-6`	BBKNN distances, MAGIC operator
2	Stochastic numerical	Kolmogorov–Smirnov ≤ τ or Wasserstein-1 ≤ τ	KS-p ≥ 0.05	dropout simulations, MCMC draws
3	Combinatorial clustering	label-invariant: ARI / NMI / Fowlkes–Mallows	ARI ≥ 0.95	leidenalg, louvain
4	Continuous embedding	rotation-invariant: Procrustes similarity	Procrustes ≥ 0.95	UMAP, t-SNE, PCA, harmonypy
5	Ranked output	top-K Jaccard / Spearman correlation	top-50 Jaccard ≥ 0.8	HVG selection, DE rankings
6	Ordinal output (pseudotime)	Pearson / Spearman correlation	Pearson ≥ 0.99 (≥ `1 − 1e-12` treated as exact)	DPT, palantir
7	Classification	label agreement / F1	F1 ≥ 0.95	scrublet doublet calls
8	Statistical inference	rank corr on −log10 p + top-K Jaccard	Spearman ≥ 0.90	diffxpy, scanpy DE

If the Python function returns multiple outputs of different classes, the manifest declares one gate per output and ALL must pass.

The 8 metric implementations live in engine/parity_metrics.py — import from there rather than redefining. (They are language-agnostic; this is the same module shape omicverse-rebuildr uses.)

📖 Full taxonomy: PARITY_TAXONOMY.md — includes the Python→Rust "when the gate fails: ordered suspicion list" (row-major-vs-transpose, f32-vs-f64, integer wrapping, reduction order, NaN handling, …).

Acceleration: 3 admissibility proof classes

Every rewrite the Acceleration Agent commits must carry one of these proofs:

Proof class	Meaning	Examples (Rust)
(E) Exact identity	Bit-equivalent output by mathematical identity or by not touching the arithmetic.	`Xᵀ X` hoisted out of a loop; Woodbury; zero-copy `ArrayView`; buffer reuse; fixed-order reduction; LTO + `codegen-units=1`.
(B) Bounded ε-approximation	Error bounded by a closed-form expression; derived in `MATH.md`, not handwaved.	Reordered `rayon` parallel sum / SIMD horizontal-add (`‖Δ‖ ≤ n·eps·max
(C) Class-containment theorem	A known theorem guarantees the same output for the relevant input class.	Euclidean MST ⊆ Delaunay (Preparata–Shamos 1985), via `spade` + `petgraph`.

📖 Full catalog: ACCELERATION_PLAYBOOK.md. The headline rule: f64 + is non-associative — reordering any float reduction turns an (E) rewrite into a (B) one that needs a bound. This is the single most common way a Rust port silently breaks deterministic-strict parity.

Evaluation: two plots, not one

Traditional evolutionary search plots iteration vs metric because the policy searches for better metric. That's the wrong model here — reconstruction's goal is identical output to the Python reference, not "better" output.

So every port produces two plots against the same iteration axis:

 wall-clock (s)
  │
  │  ●─┐                ← Iteration 0 (straight Rust translation) already
  │    │  ●─┐             a big drop vs the Python reference
  │       │    ●──●
  │ python-ref → iter 0 → iter 1 → iter 2 → iter 3
  │
  └────────────────────────────────────────────────→ iteration

 parity metric (e.g. Procrustes)
  │ ●──●──●──●─┐
  │              \
  │               ●──●   ← annotated: "rayon parallel reduction, n·eps·max|x|"
  │  ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ threshold (red dashed line)
  │
  └────────────────────────────────────────────────→ iteration

Plot 1 (top, log scale): wall-clock should monotonically decrease as rewrites land. Error bars = stddev over 3 warmup-excluded runs. Always a --release build.
Plot 2 (bottom): parity metric should stay flat at the ceiling. Every dip must be annotated with the math approximation that caused it (almost always a reordered reduction).

Wall-clock measurement rules:

Warmup: discard the first run (BLAS thread spin-up, extension load, page cache).
3 measured runs; report mean ± stddev.
CV > 10% → auto-extend to 5 runs, report median + IQR.
Fix BLAS + rayon threads via OMP_NUM_THREADS=8 / RAYON_NUM_THREADS=8 before any imports.
Never time a debug build — it is an invalid measurement.

📖 Full spec + iteration-log schema: EVALUATION.md.

Four mandatory notebooks per release

A finished port serves five audiences, each with a different need:

Audience	What they need	Where they look
Reviewer / scientist evaluating whether to trust the port	Pipeline-level proof Rust ≡ Python numerically	`compare_Python_vs_Rust.ipynb`
End user of the package	A copy-pastable Python tour of every public function (now Rust-backed)	`tutorial_<dataset>.ipynb`
Python user porting their existing code	A function-level dictionary — every Python parameter ↔ Rust parameter, side-by-side calls on identical input	`function_by_function_Python_parity.ipynb`
Auditor of the engineering process asking "did the agent really iterate?"	A per-iteration narrative log with one named subplot per iteration	`evolution.ipynb`
CI / automation	The pre-registered parity gate as a pytest assertion	`tests/test_exact_match.py`

All four notebooks ship pre-executed so GitHub renders them without re-running. Phase 4 blocks the port from being released if any one is missing.

evolution.ipynb is a forcing function. It is structured as ## Iteration N — <title> headers, one per iteration, each with a markdown narrative of what changed AND a code cell that produces a subplot for that iteration. If the agent skipped the acceleration loop, the notebook still has the baseline block (## Iteration 0 — Baseline translation) — but the protocol then audits whether obvious acceleration opportunities were missed. The summary 2-panel examples/evolution.png (auto-generated by engine.plot_evolution) supplements but does not replace this notebook.

📖 Schemas + section-by-section requirements: NOTEBOOKS.md.

Kit contents

Top-level documents

File	What it does
SETUP.md	First-time install — prerequisites, dual-env provisioning, rustup, env vars, gh auth, smoke test, troubleshooting.
PROTOCOL.md	The 6-step protocol + the two-agent inner loop. Read this in a session before starting a port.
DISCOVERY.md	Phase 0.5 — reuse before rebuild. Find existing `rs-` ports for the target and its Python deps.
PARITY_TAXONOMY.md	8-class algorithm taxonomy → which numerical-parity metric applies (+ the reduction-order rule).
ACCELERATION_PLAYBOOK.md	Catalog of Rust rewrites with the 3 admissibility proof types.
EVALUATION.md	Two-plot evaluation (`time vs iter` + `accuracy vs iter`), warmup excluded, accuracy dips annotated.
NOTEBOOKS.md	Four mandatory pre-executed notebooks per release. Non-skippable in Phase 4.
TEMPLATE.md	Standard `rs-<pkg>` repo layout + naming conventions + license decision matrix.
CHECKLIST.md	Per-port checklist to tick through, Phase 0–5.

Engine (runnable code) — `engine/`

File	What it does	Typical invocation
`smoke_test.py`	30-second sanity check — verifies the kit installs and all 8 parity metrics + audit / plot / benchmark / loop helpers work.	`python -m engine.smoke_test`
`discover_rust_deps.py`	Lists existing org repos via `gh repo list <org>` (default `omicverse`, override with `REBUILDPY_ORG`); parses the package's `pyproject.toml`; reports which deps already have `rs-` mirrors and the Rust-ecosystem crate for the rest. Cached 24h.	`python -m engine.discover_rust_deps --check <PyPkg>`
`parity_metrics.py`	The 8 parity-class metric functions (Pearson, ARI, Procrustes, KS, top-K Jaccard, …) + class dispatcher.	`from parity_metrics import compute_parity, is_pass`
`benchmark.py`	Wall-clock timer with warmup-exclusion + 3-run averaging; pins BLAS + rayon threads; auto-extends to 5 runs + median when CV > 10%.	`from benchmark import time_callable`
`py_function_audit.py`	Parses the Python package's public API (`__all__` / `def` / `class`) via `ast`, audits Rust-crate coverage (`#[pyfunction]` / `#[pymethods]` / `pub fn`), produces `AUDIT.md`.	`python -m engine.py_function_audit --py-source <pkg>-ref --rust-crate src/`
`plot_evolution.py`	Renders the two-panel evolution PNG from `ITERATION_LOG.md`, annotates accuracy dips with their math reason.	`python -m engine.plot_evolution --port-dir <path>`
`loop.py`	The rebuildpy loop as runnable code — equivalence + acceleration phases as Python callables.	`python -m engine.loop --port-dir <path> --phase equivalence`
`manifest.template.yaml`	Pre-registered parity gate spec — copy into each new port's `data/manifest.yaml`.	(file template)

File-level templates — `templates/`

Every new port copies these as starting scaffolding; nothing is generated from scratch.

Template	Becomes
`pyproject.template.toml`	The port's `pyproject.toml` (maturin build-backend + metadata)
`Cargo.template.toml`	The port's `Cargo.toml` (crate deps + release profile)
`lib.template.rs`	The port's `src/lib.rs` (PyO3 module skeleton)
`README.template.md`	The port's user-facing `README.md`
`py_reference_driver.template.py`	`tests/py_reference_driver.py` — invokes the Python reference, dumps JSON
`_run_candidate.template.py`	`tests/_run_candidate.py` — invokes the Rust extension, dumps JSON
`test_exact_match.template.py`	`tests/test_exact_match.py` — pytest test that asserts the gate
`DISCOVERY.template.md`	The port's `DISCOVERY.md` artefact (Phase 0.5)
`ITERATION_LOG.template.md`	The port's `ITERATION_LOG.md` (Phase 3 acceleration log)
`RECONSTRUCTION_REPORT.template.md`	The port's `RECONSTRUCTION_REPORT.md` (8-section final report)
`MATH.template.md`	The port's `MATH.md` (perturbation bounds for (B) rewrites)
`compare_Python_vs_Rust.template.ipynb`	Notebook 1 — pipeline parity
`tutorial.template.ipynb`	Notebook 2 — Python tutorial
`function_by_function_Python_parity.template.ipynb`	Notebook 3 — Python⇄Rust function dictionary
`evolution.template.ipynb`	Notebook 4 — per-iteration evolution
`py_per_function_dump.template.py`	Python driver feeding Notebook 3

Examples & roadmaps — `examples/`

File	What it does
ROADMAP.md	Ranked Python packages awaiting Rust ports, with the rust-ecosystem crates each would lean on.
EXAMPLE_WALKTHROUGH.md	End-to-end Phase 0 → Phase 5 narrative for one port, with concrete commands and intermediate outputs.

What the agent does in a session

A typical agent session opens with:

Port Python package X to Rust. Follow rebuildpy/README.md.

The agent then executes:

(Phase 0.5 — Discovery) Run engine/discover_rust_deps.py to check:
- Is <org>/rs-X already published? → if yes, STOP, report the existing repo.
- Which of X's Python deps already have rs- mirrors or a standard Rust crate? → record in DISCOVERY.md.
(Phase 0) Look up X's algorithm class in PARITY_TAXONOMY.md. Write and commit data/manifest.yaml with the algorithm class, threshold, canonical fixture path, seed, and per-output gate blocks. The gate is read-only after this.
(Phase 1) Copy the layout from TEMPLATE.md (seed shape chosen by algorithm class).
(Phase 2 — Equivalence Agent) Translate each Python function in dependency order into Rust; maturin develop --release; run the per-function parity diff. Iterate until the gate clears at the pre-registered threshold.
(Phase 3 — Acceleration Agent) For each candidate rewrite from ACCELERATION_PLAYBOOK.md:
- Check precondition + produce admissibility proof (E / B / C). For any reordered reduction, derive the n·eps·max|x| bound in MATH.md.
- Apply on a working branch; rebuild; re-run parity test (gate still clearing?); re-benchmark.
- Accept if speedup > 1.05× and gate clears; else roll back.
- Append one YAML block to ITERATION_LOG.md per attempt.
(Phase 4 — release artefacts) Tick CHECKLIST.md end-to-end; produce all mandatory deliverables:
- RECONSTRUCTION_REPORT.md (8 sections)
- MATH.md (perturbation bounds for any (B) rewrites)
- AUDIT.md (Python-API coverage, auto-generated by engine.py_function_audit)
- examples/evolution.png (two-panel plot, auto-generated by engine.plot_evolution)
- examples/compare_Python_vs_Rust.ipynb — pipeline parity
- examples/tutorial_<dataset>.ipynb — Python tutorial
- examples/function_by_function_Python_parity.ipynb — Python⇄Rust dictionary
- examples/evolution.ipynb — per-iteration narrative + subplot
(Phase 5 — release) maturin build --release → wheel to PyPI; cargo publish → crates.io; create GitHub repo + release; add the port as a seed template for future ports.

Always-first invariant: Phase 0.5 (Discovery) is non-skippable. If discovery is skipped, the protocol fails — we risk re-implementing a crate that already exists.

No deferred items in Phase 4: every artefact above is mandatory.

When to use this kit (and when not to)

Use this kit when:

✅ The target is a Python package with a clear numerical output (vector, matrix, table, cluster IDs, p-values) that you want faster and memory-safe.
✅ You can construct a canonical input fixture small enough for fast iteration (< 1 minute end-to-end for the Python reference).
✅ The upstream Python package is open-source under a license you can match.
✅ You're prepared to commit time on the order of 1–5 working days for a clean port.

Don't use it when:

❌ The "Python package" you want is closed-source or only described in a paper without runnable code — no executable spec means no parity oracle.
❌ The algorithm is dominated by calls into another compiled library (the Python is a thin wrapper) — there's little to gain from a Rust rewrite.
❌ You want a Rust algorithm that's better than the Python one, not identical. This kit refuses to widen the gate; fork after the port lands.
❌ The hot path is already a well-tuned C/Fortran extension (e.g. pure BLAS) — Rust won't beat it and the parity oracle's ceiling is that extension.

The evolutionary-RL analogy (in one paragraph)

The acceleration loop is verifier-guided test-time search, not weight-update RL — and importantly not metric optimization:

Component	Mapping
Policy	The LLM in-context (no fine-tune, no weight updates).
Action	One rewrite drawn from `ACCELERATION_PLAYBOOK.md` (rayon outer-axis map, zero-copy view, Woodbury, `target-cpu=native`, MST ⊆ Delaunay, …).
Environment	The parity test + a 3-run-mean stopwatch on a `--release` build over the canonical fixture (see EVALUATION.md).
Reward	`r_t = φ(a_t) · speedup(a_t)` — gate must still clear (`φ = 1`), then wall-clock speedup ranks admissible candidates.
Best-so-far register	The last commit on the in-progress port. Roll back if a later rewrite breaks parity.

What we don't do: improve the algorithm's metric. Reconstruction's goal is identical outputs to the Python reference, not "better" ones. Two evaluation plots come out of every port: time vs iteration (monotonically decreasing) and accuracy vs iteration (flat at the maximum; every dip annotated with the math approximation — almost always a reordered reduction).

No model weights change. Search occurs inside one coding-agent session, with the parity test as oracle and the wall-clock as cost function.

Final artefact — reconstruction report

After the parity gate clears and the Acceleration loop terminates, the agent fills out RECONSTRUCTION_REPORT.md. The 8 sections:

Identity — package, upstream version, algorithm class, threshold, final parity value, audit class A/B/C, LOC, speedup vs Python.
Python API coverage audit — every public name from the package is in the table (ported / skipped with reason). Auto-populated by engine.py_function_audit. Also lists dependencies reused (ecosystem audit — which Rust crates / rs- mirrors were reused vs re-implemented).
Parity evidence — per-output metric values, per-fixture wall-clock + parity, reproducible reference command.
Acceleration evidence — two-panel evolution figure embedded, accepted-vs-rejected rewrites with admissibility proofs.
Code quality audit — maturin build --release + pip install + pytest green + four mandatory notebooks executed + license compatible + version pinned. All non-skippable.
Known limitations — honest list of what the port doesn't do; never used as an excuse to widen the gate.
Integration — crate/wheel location, public-API exposure, tutorial slot.
Sign-off — author, date, active time spent, final audit class.

This is what we present as "the port is done".

Evolution — how the protocol got here

The protocol is a faithful adaptation of omicverse-rebuildr (R→Python), re-pointed at the Python→Rust direction. The changes that the new direction forces:

Area	What changed vs the R→Python kit	Why
Reference / target	Python is the reference (executable spec); Rust is the target. Envs become `PYTHON_REF_ENV` / `RUST_TEST_ENV`.	The fast language is now the target, not the source.
Deterministic error sources	"cross-BLAS rounding" is joined by parallel/SIMD float-reduction reordering as the dominant (B) source.	f64 `+` is non-associative; `rayon`/SIMD reorder sums. This is the new central admissibility concern.
Acceleration playbook	R→Python algebraic rewrites are kept; added Rust-specific §2 memory/ownership, §3 parallelism/SIMD, §4 compiler flags, §5 interpreter-overhead removal.	Rust's speed comes from ownership + parallelism + codegen, not just algebra.
Coverage audit	`NAMESPACE` parsing → Python `__all__`/`ast` parsing; coverage checked against `#[pyfunction]`/`pub fn`.	The source is Python; the target is a Rust crate.
Discovery	R `DESCRIPTION` deps → Python `pyproject.toml` deps; deps mapped to `rs-` ports and standard crates (ndarray/polars/petgraph/linfa).	Reuse the Rust ecosystem, not just org mirrors.
Build / release	wheel-on-PyPI → maturin wheel on PyPI + crate on crates.io; debug-build timings declared invalid.	Rust has two distribution channels and a release/debug split.

Ports shipped under this protocol

See examples/ROADMAP.md for the full ranked list.

Status	Port	Date	Audit	Speedup	Notes
⬜ next	rs-leidenalg	—	TBD	TBD	Community detection; petgraph + ARI gate; highest reuse density
⬜	rs-scrublet	—	TBD	TBD	Doublet detection; classification/F1 gate; rayon per-cell map
⬜	rs-harmonypy	—	TBD	TBD	Batch integration; embedding/Procrustes gate; ndarray-linalg
⬜	rs-fa2	—	TBD	TBD	ForceAtlas2 layout; deterministic-bounded; SIMD axpy
⬜	rs-palantir	—	TBD	TBD	Pseudotime; ordinal/Pearson gate; Woodbury

FAQ

Q: How long does a typical port take? A: Translation-only (class A): 1–3 days. With minor acceleration (class B): 2–5 days. Heavy acceleration with proofs (class C): 1–2 weeks. The Rust baseline translation usually already delivers most of the speedup; acceleration is about the last 2–5×.

Q: Why is parity so much more fragile than R→Python? A: Because Rust gives you parallelism and SIMD by default, and f64 addition is non-associative. A rayon parallel sum is reproducible but not bit-identical to a serial sum. The protocol handles this with the reduction-order rule: fixed-order reduction = (E) exact; reordered = (B) bounded by n·eps·max|x|, declared in MATH.md. See PARITY_TAXONOMY.md.

Q: maturin/PyO3 or a standalone Rust binary for the candidate? A: Default to maturin/PyO3 — the deliverable is a pip-installable Rust-backed Python package, and the candidate runner just import rs_<pkg>. A cargo run binary that dumps JSON is an acceptable fallback when the package is CLI-shaped, but you lose the "drop-in faster replacement" property.

Q: What if my target's deps have no Rust crate (e.g. statsmodels)? A: Either (a) port the specific routine you need into the crate, or (b) keep that step in Python and call back across the boundary, documenting the seam in MATH.md and RECONSTRUCTION_REPORT.md §6. Don't pretend a different routine is equivalent.

Q: My port gets a 1.2× speedup from a rayon reduction but Procrustes drops from 1.0000 to 0.9990 (still above threshold). Accept? A: Only if it's a (B) rewrite with the perturbation bound derived in MATH.md. A "small" empirical drop with no closed-form bound is a bug, not an optimisation. Reject and either fix the reduction order ((E)) or derive the bound.

Q: Can I publish to a different GitHub org? A: Yes. Export REBUILDPY_ORG=<your-org> before running engine.discover_rust_deps. The kit pushes nothing automatically — Phase 5's gh repo create, maturin publish, and cargo publish are explicit and you control them.

Q: Does this work on Windows? A: Tested on Linux; macOS should work (set the Accelerate BLAS backend for ndarray-linalg). Windows requires WSL2 because the kit shells out to bash for some pipe operations.

License

The kit itself is MIT. Each individual port matches its upstream Python package's license (GPL-3 if upstream is GPL ≥ 2; MIT/Apache-2.0 dual — the Rust convention — if upstream is permissive). See TEMPLATE.md §License decision matrix.

Provenance

This protocol is a direct adaptation of the omicverse-rebuildr recipe (reference-driven cross-language library synthesis via LLM agents), re-pointed from R→Python to Python→Rust. The reference-driven parity-gate methodology, the 8-class taxonomy, the two-plot evaluation, and the verifier-guided acceleration search are inherited wholesale; the Python→Rust direction adds the reduction-order admissibility rule and the Rust-specific acceleration playbook. Case-study ports live under github.com/<org>/rs-*.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rebuildpy

What this is

Quick start

How to invoke the protocol in a session

The protocol — 6 steps

The 8 algorithm classes (parity taxonomy)

Acceleration: 3 admissibility proof classes

Evaluation: two plots, not one

Four mandatory notebooks per release

Kit contents

Top-level documents

Engine (runnable code) — `engine/`

File-level templates — `templates/`

Examples & roadmaps — `examples/`

What the agent does in a session

When to use this kit (and when not to)

The evolutionary-RL analogy (in one paragraph)

Final artefact — reconstruction report

Evolution — how the protocol got here

Ports shipped under this protocol

FAQ

License

Provenance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
engine		engine
examples		examples
templates		templates
.gitignore		.gitignore
ACCELERATION_PLAYBOOK.md		ACCELERATION_PLAYBOOK.md
CHECKLIST.md		CHECKLIST.md
DISCOVERY.md		DISCOVERY.md
EVALUATION.md		EVALUATION.md
LICENSE		LICENSE
NOTEBOOKS.md		NOTEBOOKS.md
PARITY_TAXONOMY.md		PARITY_TAXONOMY.md
PROTOCOL.md		PROTOCOL.md
README.md		README.md
SETUP.md		SETUP.md
TEMPLATE.md		TEMPLATE.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

rebuildpy

What this is

Quick start

How to invoke the protocol in a session

The protocol — 6 steps

The 8 algorithm classes (parity taxonomy)

Acceleration: 3 admissibility proof classes

Evaluation: two plots, not one

Four mandatory notebooks per release

Kit contents

Top-level documents

Engine (runnable code) — engine/

File-level templates — templates/

Examples & roadmaps — examples/

What the agent does in a session

When to use this kit (and when not to)

The evolutionary-RL analogy (in one paragraph)

Final artefact — reconstruction report

Evolution — how the protocol got here

Ports shipped under this protocol

FAQ

License

Provenance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Engine (runnable code) — `engine/`

File-level templates — `templates/`

Examples & roadmaps — `examples/`

Packages