Pick the class before writing any agent code. The class fixes the parity metric, which fixes the pass/fail threshold in
manifest.yaml. The threshold is committed before agent work begins and is read-only during the agent loop.
The metric only ever sees numbers, so this taxonomy is language-agnostic — it is the same taxonomy omicverse-rebuildr uses for R→Python. What changes for Python→Rust is which error sources the deterministic tolerance must absorb (see below).
| # | Algorithm class | Parity criterion | Default threshold | Example Python packages |
|---|---|---|---|---|
| 1 | Deterministic numerical (3 sub-tiers — see below) | element-wise: max abs err < tol (optionally rtol-scaled) |
standard: 1e-8; strict: 1e-13; bounded: 1e-6 |
numpy/scipy kernels, BBKNN distances, MAGIC operator |
| 2 | Stochastic numerical | distributional: Kolmogorov–Smirnov ≤ τ or Wasserstein-1 ≤ τ | KS-p ≥ 0.05 or W₁ ≤ 1% of scale | MCMC draws, dropout-simulation posteriors |
| 3 | Combinatorial clustering | label-invariant: ARI / NMI / Fowlkes–Mallows | ARI ≥ 0.95 | leidenalg, louvain, KMeans labels |
| 4 | Continuous embedding | rotation-invariant: Procrustes similarity (1 − min‖.‖² after best rotation/translation/scaling) | Procrustes ≥ 0.95 | UMAP, t-SNE, PCA, harmonypy |
| 5 | Ranked output | top-K Jaccard / Spearman correlation of the ranking | top-50 Jaccard ≥ 0.8; Spearman ≥ 0.9 | rank_genes_groups, HVG selection |
| 6 | Ordinal output (pseudotime) | Pearson / Spearman correlation of per-element values | Pearson ≥ 0.99 (treats ≥ 1 − 1e-12 as exact — pearsonr is itself f64-noisy) |
DPT, palantir pseudotime |
| 7 | Classification (binary / multi) | label agreement / F1 | Agreement ≥ 0.95 | scrublet doublet calls, cell-type labels |
| 8 | Statistical inference | rank correlation on −log10 p + top-k overlap | Spearman ≥ 0.90 on −log10 p; top-50 Jaccard ≥ 0.7 | diffxpy, scanpy DE, GSEA |
The class-1 threshold needs to absorb two independent error sources. This is where Python→Rust differs from R→Python — the error sources are different:
| Source | Typical magnitude |
|---|---|
f64 rounding + BLAS divergence (numpy links one BLAS; Rust ndarray-linalg links another — OpenBLAS / MKL / Accelerate) |
eps · √N ≈ 1e-14 to 1e-10 |
Parallel / SIMD float-reduction reordering — rayon par_iter().sum() and autovectorised SIMD reduce in a different order than a serial loop, and f64 + is non-associative |
`≈ n · eps · max |
| Any (B) bounded ε-approximation introduced by an Acceleration rewrite | per-rewrite, typically 1e-9 to 1e-6 (sum of admitted rewrites, derived in MATH.md) |
So a single fixed threshold is the wrong abstraction. Pick a sub-tier per port:
Sub-tier (manifest.yaml::algorithm_class) |
Default atol |
Default rtol |
Hard ceiling | When to use |
|---|---|---|---|---|
deterministic-strict |
1e-13 |
1e-15 |
1e-13 |
element-wise / single-pass / fixed-order (serial) reduction. The Rust port must not reorder any sum. |
deterministic (alias) / deterministic-standard |
1e-8 |
— | 1e-8 |
Default. One or two matmul / PCA; numpy-BLAS ↔ Rust-BLAS divergence is fine. |
deterministic-bounded |
1e-6 |
— | 1e-6 |
Contains (B) ε-approximation rewrites including reordered parallel/SIMD reductions. MATH.md must derive Σ bound ≤ atol. |
Hard ceiling rule: any deterministic-* threshold above 1e-6 is rejected by is_pass() — at that scale "deterministic" has lost meaning and the port should switch to ordinal (Pearson) or embedding (Procrustes) instead. This is non-negotiable; widening the gate is the cardinal sin the protocol forbids.
The reduction-order rule (Rust-specific): a serial, fixed-order reduction in Rust is (E) exact with respect to the Python reference if Python also reduced serially in that order. The moment you switch to rayon parallel reduction or rely on SIMD autovectorisation, the summation order changes and the result is no longer bit-identical — it becomes a (B) bounded rewrite with bound ≈ n · eps · max|xᵢ|, which must be derived in MATH.md and which forces the deterministic-bounded tier. Do not silently parallelise a deterministic-strict port.
Relative-tolerance mode: when output values span many orders of magnitude (p-values, abundance counts, eigenvalues), set parity_rtol in the manifest:
algorithm_class: deterministic-standard
parity_threshold: 1.0 # required when rtol > 0; pass iff returned-value < 1
parity_rtol: 1e-8 # error scaled by rtol·|reference|
parity_atol: 1e-10 # small absolute floor for values near zeroparity_deterministic(..., rtol=...) then returns max(|ref - cand| / (rtol·|ref| + tiny)), and parity_threshold caps that normalised quantity (typically 1.0).
Why ordinal treats 1.0 as ≥ 1 − 1e-12: scipy.stats.pearsonr accumulates f64 rounding internally. Even on bit-identical inputs the returned r is typically 0.9999999999999999, not literal 1.0. The is_pass helper subtracts 1e-12 from the ordinal threshold before the comparison so a perfect port doesn't fail by one ulp.
Ask one question: what does the Python function return?
- A vector / matrix of floats meant to be compared bit-by-bit → (1) Deterministic.
- Samples from a posterior or any RNG-driven simulation → (2) Stochastic.
- A vector of cluster IDs (integer labels with no canonical ordering) → (3) Clustering.
- A coordinate matrix where the absolute basis is meaningless (PCA, UMAP, t-SNE) → (4) Embedding.
- A ranked list of features (gene names, top-K) → (5) Ranked.
- A continuous monotonic per-element ordering (pseudotime, latent time, z) → (6) Ordinal.
- A binary or multi-class label per element (where the labels HAVE canonical meaning, e.g., "Doublet"/"Singlet") → (7) Classification.
- A vector of p-values or effect sizes from a statistical test → (8) Inference.
If the Python function returns multiple outputs with different classes (e.g., embedding + clustering + scores), give each its own gate in manifest.yaml::outputs: and require all to pass.
A naive "element-wise equality with tolerance" works only for class 1. For clustering, [0,1,1,0] and [1,0,0,1] encode the same partition — needs ARI. For embeddings, two coordinate matrices differing by an orthogonal rotation are the same up to basis choice — needs Procrustes. Worse in the Rust setting: any algorithm with internal parallelism produces a reproducible-but-different float result, so a uniform 1e-13 gate would false-fail every rayon-accelerated port. The class-aware gate (plus the reduction-order rule) keeps the gate meaningful.
| Class | Use this | Notes |
|---|---|---|
| 1 Deterministic | np.allclose(a, b, rtol=0, atol=tol) + scipy.stats.pearsonr(a.ravel(), b.ravel()) |
Demand both: element-wise + Pearson |
| 2 Stochastic | scipy.stats.ks_2samp(a, b) or scipy.stats.wasserstein_distance(a, b) |
Seed-locked Python reference + seed-locked Rust RNG |
| 3 Clustering | sklearn.metrics.adjusted_rand_score(labels_py, labels_rs) |
Always ARI; NMI as secondary |
| 4 Embedding | scipy.spatial.procrustes(M_py, M_rs) returns (_, _, disparity); similarity = 1 - disparity |
Sign-flip ambiguity is absorbed |
| 5 Ranked | len(set(top_k_py) & set(top_k_rs)) / len(set(top_k_py) | set(top_k_rs)) + scipy.stats.spearmanr |
top-50 is the conventional K |
| 6 Ordinal | scipy.stats.pearsonr(pt_py, pt_rs) + spearmanr |
Pearson is primary; demand ≥0.99 |
| 7 Classification | sklearn.metrics.f1_score(y_py, y_rs, average='binary' or 'macro') + (y_py == y_rs).mean() |
If labels are "Doublet" / "Singlet", keep that — don't relabel to 0/1 |
| 8 Inference | scipy.stats.spearmanr(-np.log10(p_py), -np.log10(p_rs)) + top-K Jaccard |
Use −log10 p to weight small p-values more |
All of these are implemented in engine/parity_metrics.py — import from there to avoid re-deriving thresholds across ports.
Don't. Each port has one algorithm-class gate at one threshold, fixed before agent work. No multiple-comparison correction; no per-fixture adjustment. The gate either clears or it doesn't.
For class 2 / 3 / 7 / 8 algorithms with internal RNG:
- Pin the seed in
manifest.yaml::seed:(default 42). - In both
py_reference_driver.pyand_run_candidate.py, set the seed immediately before calling the algorithm. Python usesnp.random.seed(42)andrandom.seed(42); the Rust side seeds itsrandgenerator (StdRng::seed_from_u64(42)) with the same value. - The Python RNG (NumPy PCG64 / Mersenne Twister) and the Rust
randRNG produce different streams — they will diverge. The comparison then degrades from element-wise to distributional (class 2) or label-invariant (class 3) — that's the point of the taxonomy. For a faithful element-wise port of an RNG-heavy algorithm, reimplement the exact same generator in Rust (e.g. mirror NumPy's PCG64) and document it inMATH.md.
Order of suspicion (Python→Rust edition, in decreasing frequency on real ports):
- Row-major vs the wrong axis: NumPy is row-major (C order);
ndarraydefaults to row-major too, but a.t()view or aArray::from_shape_vecwith the wrong order silently transposes. Check shapes AND strides. - f32 vs f64: a Rust literal
0.5isf64, but1.0f32or anArray<f32>quietly halves your precision. Match the reference dtype exactly. - Integer overflow / wrapping: Rust
u32/i32arithmetic wraps in release builds (panics in debug). Python ints are arbitrary precision. Useu64/i64or checked arithmetic where the reference can exceed 2³¹. - Reduction order: a
rayonparallel sum or SIMD reduction reorders f64 addition → not bit-identical. Either fix the order (serial / deterministic fold) for(E), or declare(B)with then·eps·max|x|bound. - 0-based both, but off-by-one in ranges: Python
range(1, 9)is[1..8]; Rust1..9is also[1..8]— but1..=9is inclusive. Watch the inclusive-range operator. - BLAS backend default tolerance:
ndarray-linalg'seig/svd/qrmay order eigenvalues or sign eigenvectors differently from LAPACK-via-scipy. Sort + sign-fix before comparing (or use theembeddingclass). - NaN / Inf propagation: NumPy silently produces
nan; Rust f64 also propagatesnanbut comparisons differ (nan != nan), andpartial_cmpreturnsNone. Decide NaN handling explicitly. - Sparse format: scipy CSR vs a hand-rolled Rust sparse representation — verify index ordering (sorted indices), duplicate handling, and implicit-zero semantics.
Ablate each in order. Do not loosen the gate.