26 Jun 05:41

github-actions

11005ba

Nightly Build Pre-release

Pre-release

Automated nightly build from main.

Date: 2026-06-25T23:03:49Z
Commit: 11005ba

This is a prerelease. For stable releases, see the latest tagged version.

Assets 10

24 Jun 15:24

noahgift

v0.55.0

6f7c864

v0.55.0 — runnable models + reconciled GPU parity + autograd training proven Latest

Latest

v0.55.0 — runnable models, reconciled GPU parity, and the autograd training story proven

A correctness + portability wave. The headline pair: (1) apr convert/apr export now produce runnable models for tied-embedding architectures; (2) an end-to-end training proof caught that the transformer FFN was still severing the autograd graph after the v0.53/v0.54 "complete" sweep — per-layer gradchecks never saw it; a real train-to-loss test did. Plus the Blackwell GPU/CPU parity gate reconciled against ground truth, an Ollama HTTP drop-in, and a real cross-silicon portability crash fix. Every fix ships a named proof-obligation + a mutation-verified RED-on-bug/GREEN-on-fix falsifier + a pv-validated contract.

Correctness

apr convert --quantize q4k produced a non-runnable .apr for tied-embedding models (PMAT-918, #2209) — the Q4K save path never synthesized the tied lm_head (the f32 path did), so the model failed to load. Tie-synthesis hoisted before quant dispatch; verified runnable end-to-end on Blackwell.
apr export --format gguf silently mis-inferred num_heads on metadata-light .apr (PMAT-920, #2212) — a first-divisor guess would stamp e.g. Qwen2-1.5B as 24 heads (true 12). Now uses the explicit head_dim for exact num_heads, and hard-fails with an actionable error (no GGUF written) when genuinely absent — never a silently-wrong model.
GPU/CPU parity gate falsely rejected the correct Blackwell kernel (PMAT-919, #2210) — reconciled against ground truth (llama.cpp + CPU-Q4K, per-position, on 1.5B/7B/8B): fp32-Mwv is the correct Blackwell default; HwDp4a is genuinely degraded (INT8-activation quant). The F2 gate now checks per-position argmax-match + min-cosine over positions ≥1, replacing the last-token-only check that let a degraded kernel pass. On-device verified on Ada (4090) + Blackwell; 7B serves coherently on GPU.
Autograd: the transformer FFN gelu severed the graph (PMAT-921, #2213) — TransformerEncoderLayer's FFN built output via Tensor::from_vec (no grad_fn), freezing ffn.linear1 + norm2 γ/β in every real training run while isolated gradchecks stayed green. Caught by a new end-to-end train-to-loss test (loss 3.565 → 1.4e-5, every param group updates). The proof per-layer gradchecks can't give.

Performance / GPU

cuda-oxide RoPE kernel (#2215) — adjacent-pair RoPE ported to a pure-Rust #[kernel]; on-device A/B on GB10 Blackwell (sm_121): bit-exact (cos=1.0) and a clean tie with hand-PTX at the DRAM-bandwidth roofline ⟹ migrate-for-free off the hand-PTX + Blackwell-JIT path. Fourth GO kernel (attention, RMSNorm, SwiGLU, RoPE).

Compatibility / Portability

Ollama /api/chat + /api/generate (PMAT-923, #2216) — wired into every apr serve router (APR/GGUF/SafeTensors/WGPU), making apr serve a drop-in Ollama HTTP target for non-streaming clients (NDJSON streaming is a documented follow-up).
wgpu adapter-enumeration SIGABRT on Linux/AMD-RADV (PMAT-925, #2217) — enumerate_adapters(Backends::all()) instantiated a GLES/EGL adapter whose Drop panics → process abort. Constrained enumeration to Backends::PRIMARY (never GLES); cross-silicon verified on AMD-Vulkan (RADV) + Apple-Metal, no regression. Found by the 4-corner silicon-matrix verification.

Build / CI

Gated the duckdb competitive bench (#2208) and the coop_gemm_bench wgpu-27 example (#2211) behind features — eliminates the merge_group cold-build flake and the --all-targets break.

Crates.io publish is handled separately.

Assets 10

24 Jun 07:26

noahgift

v0.54.0

2335207

v0.54.0 — autograd graph complete (transformers end-to-end trainable) + numeric/quant correctness

Correctness-beat wave (PMAT-913..917) — the headline completes the autograd severed-graph sweep:
following the v0.53.0 norm-backward fixes, the Embedding / pooling / attention layers were also
building their forward output via Tensor::from_vec / Tensor::new (a leaf with no grad_fn), so
their parameters + input received zero gradient. With these fixes the full transformer (and CNN)
autograd graph is intact end-to-end — transformers are now genuinely fine-tunable. Plus numerical
(f32→f64), loss, and quantization correctness. Each fix ships a named proof-obligation + an
adversarially-mutation-verified RED-on-bug / GREEN-on-fix falsifier + a pv-validated contract.

Fixed

Autograd: attention backward was severed — Q/K/V got no gradient (PMAT-914, Pillar-2,
OBLIG-ATTENTION-BACKWARD-GRAD-FLOW) — the scaled-dot-product attention core (batched-matmul-4D,
transpose_last_two, softmax_last_dim, head reshape) built its intermediates via Tensor::new,
severing the chain so q_proj.weight.grad == None. This was the last link: despite the v0.53.0
norm fixes + the Embedding/pool fixes below, transformers were still NOT end-to-end trainable. Added
5 grad_fns (softmax Jacobian, batched-matmul, transpose/reshape) — attention now flows gradient,
finite-diff gradcheck-verified.
Autograd: Embedding / Flatten / MaxPool / AvgPool backward were severed (PMAT-913, Pillar-2,
OBLIG-{EMBEDDING,FLATTEN,MAXPOOL1D,MAXPOOL2D,AVGPOOL2D,GLOBALAVGPOOL2D}-BACKWARD-GRAD-FLOW) — all six
built output via Tensor::new; a severed Embedding meant token embeddings were non-trainable.
Added each backward (Embedding scatter-ADD, pool argmax/area routing, Flatten reshape); 8 gradchecks.
BCEWithLogitsLoss(pos_weight) weighted the whole loss instead of the positive term (PMAT-915,
Pillar-2, OBLIG-BCE-POSWEIGHT-PYTORCH-PARITY) — coincided with PyTorch only for hard 0/1 targets
(which every existing test used), so on soft targets the loss diverged (1.096 vs torch 1.038). Now
matches torch.nn.BCEWithLogitsLoss.
StandardScaler + PCA accumulated mean/variance/covariance in f32 (PMAT-916, Pillar-1,
OBLIG-{SCALER,PCA}-F64-ACCUM) — catastrophic cancellation on large-magnitude data: StandardScaler
std was ~75× wrong and PCA explained-variance ~10000× wrong vs the numpy/sklearn f64 reference. Now
reduces in f64 (stored as f32; public API unchanged).

Added

Quantization round-trip fidelity gate (Q4_K / Q5_K / Q6_K) (PMAT-917, Pillar-4,
OBLIG-QUANT-ROUNDTRIP-FIDELITY) — a standing contract + falsifier pinning that quantize→dequantize
reconstruction stays within the per-scheme affine half-step error bound (err/bound ratios 0.46–0.69;
mutation-verified — halving the scale or dropping the block offset trips it). Supports the
"provably-correct dequant" pillar: a future quant regression is now caught a-priori.

Assets 10

24 Jun 01:30

noahgift

v0.53.0

e0e244f

v0.53.0 — PMAT-904..911 correctness-beat wave (4 pillars)

Correctness-beat wave (PMAT-904..911) across all four pillars — the headline is the autograd
norm-backward family (LayerNorm / RMSNorm / BatchNorm1d / GroupNorm), which makes every
normalization-using transformer and CNN fine-tunable (their affine γ/β had been receiving zero
gradient). Each fix ships a named proof-obligation + a RED-on-bug / GREEN-on-fix falsifier
(adversarially mutation-verified) + a pv-validated contract.

Fixed

Autograd: LayerNorm + RMSNorm backward severed the affine gradient (PMAT-907, Pillar-2,
OBLIG-{LAYERNORM,RMSNORM}-BACKWARD-GRAD-FLOW) — nn::functional::layer_norm/rms_norm built
their output via Tensor::from_vec (a leaf with no grad_fn), so after backward() the scale γ,
shift β, and input x all received zero gradient. Every transformer using these norms was
non-fine-tunable. Added LayerNormBackward/RmsNormBackward with correct dγ/dβ/dx; gradients now
match a finite-difference gradcheck.
Autograd: BatchNorm1d + GroupNorm backward severed the affine gradient (PMAT-911, Pillar-2,
OBLIG-{BATCHNORM1D,GROUPNORM}-BACKWARD-GRAD-FLOW) — same severed-graph bug as PMAT-907 for the
remaining norms (train-mode BatchNorm batch-stat backward + per-group GroupNorm backward). Completes
the norm family: all four norms now flow gradient to γ/β.
CrossEntropyLoss label smoothing distributed the off-target mass as (1-eps)/C (PMAT-910,
Pillar-2, OBLIG-CE-LABEL-SMOOTHING-UNIFORM-MASS) — should be eps/C per non-target class
(so q_target = 1-eps+eps/C). On eps=0.1/C=5 the smoothed loss was 3.29× too large (2.384 vs the
0.7244 PyTorch/analytic value). Now matches torch.nn.CrossEntropyLoss(label_smoothing=...).
t-test / ANOVA / chi-square returned NaN p-values for df ≳ 72 (PMAT-904, Pillar-1,
OBLIG-{CHISQUARE,HYPOTHESIS}-PVALUE-FINITE) — a raw-space Lanczos gamma() overflowed f32 at
z ≥ ~36, so the incomplete-gamma/beta prefactors went Inf/Inf = NaN. Rebuilt in log-space
(ln_gamma + single bounded exp); p-values now finite + match scipy within 1e-5.
f16 export truncated instead of round-to-nearest-even (PMAT-905, Pillar-4,
OBLIG-SAFETENSORS-F16-EXPORT-RNE) — f32_slice_to_f16_bytes dropped the low 13 mantissa bits and
flushed the entire subnormal range to ±0, diverging from half::f16::from_f32 (e.g. 65520.0 stayed
finite instead of rounding up to +Inf; 2^-24 became 0 instead of the smallest subnormal). Now true
RNE across normal + subnormal grids (the F16 sibling of the PMAT-859 BF16 fix).
Weighted KNN capped a zero-distance neighbor at weight 1.0 (PMAT-909, Pillar-1,
OBLIG-KNN-WEIGHTED-ZERO-DISTANCE) — sklearn weights="distance" gives an exact-duplicate neighbor
infinite weight (only the zero-distance neighbors vote); apr let farther neighbors outvote the exact
match, flipping predictions. Now matches sklearn.

Added

Fail-closed: reject special-token id ≥ vocab_size at load (PMAT-908, Pillar-4,
OBLIG-SPECIAL-TOKEN-WITHIN-VOCAB) — a config whose eos/bos id is ≥ vocab_size loaded silently; the
stop token is then an unreachable logit so generation never stops. Now rejected with an actionable
error (and the arch-default EOS fallback no longer injects an out-of-vocab id into a small-vocab
model). llama.cpp/Ollama load these silently.
Fail-closed: reject APR config↔tensor shape mismatch at load (PMAT-906, Pillar-4,
OBLIG-APR-{VOCAB-EMBED-CONSISTENT,WEIGHT-SHAPE-MATCHES-CONFIG}) — AprV2Model::from_model_data
accepted a model whose declared vocab_size/hidden_size disagreed with the embedding/lm_head
tensor shape (garbage / OOB at inference). Now rejected fail-closed.

Assets 10

23 Jun 19:58

noahgift

v0.52.0

be2bf89

v0.52.0 — PMAT-889..898 correctness-beat wave (4 pillars + cuda-oxide)

Correctness-beat wave (PMAT-889..898) across all four pillars + the cuda-oxide marquee. Each fix ships a named proof-obligation + a RED-on-bug / GREEN-on-fix falsifier + a pv-validated contract.

Fixed

GaussianNB var_smoothing diverged from scikit-learn (PMAT-890, Pillar-1, F-GAUSSIANNB-EPSILON-003) — added a raw 1e-9 instead of var_smoothing · max(var across features); on mixed-scale data the smoothed variance was thousands of times too small (able to flip predict). Now matches sklearn's epsilon = var_smoothing · X.var(axis=0).max().
CrossEntropyLoss backward ignored the reduction mode (PMAT-891, Pillar-2, F-AUTOGRAD-CE-REDUCTION-001) — always divided the gradient by batch, so Sum-reduction grads were batch× too small (learns at 1/batch the intended rate) and None mis-broadcast. Now: Mean /batch, Sum no /batch, None per-sample.
L1Loss backward was severed (PMAT-896, Pillar-2, F-L1LOSS-BACKWARD-GRAD-001) — loss.backward() produced no gradient (silent zero-learning); abs() built its result without a grad_fn. Added AbsBackward (d|x|/dx = sign(x)).
apr merge --method lora-adapter mis-merged PEFT/Unsloth adapters (PMAT-897, Pillar-3, F-LORA-MERGE-RSLORA-001 + F-LORA-MERGE-ADAPTER-DTYPE-001) — ignored use_rslora (applied scale 1.0 instead of alpha/√rank) and decoded BF16/FP16 adapter tensors as hardcoded f32 (garbage). Now honors use_rslora and threads per-tensor dtype.
SGD-with-momentum diverged from PyTorch under an LR schedule (PMAT-898, Pillar-2, F-SGD-MOMENTUM-LRSCHED-001) — baked the learning rate into the velocity buffer, so a mid-training set_lr used a stale lr (~40% off). Now stores the unscaled buffer and applies lr fresh each step (scalar + SIMD paths).

Added

Fail-closed: reject a dead output row (PMAT-889, Pillar-4, F-DATA-QUALITY-007) — apr validate now rejects a model with a fully-zero lm_head/embed output row (a structurally-unreachable logit) that llama.cpp/Ollama silently load+run.
Fail-closed: reject NaN/Inf quantized weights at load (PMAT-895, Pillar-4, OBLIG-GGUF-LOAD-NANINF) — OwnedQuantizedModel::from_mapped now rejects a Q4_0/Q4_K block whose f16 scale is NaN/+Inf (poisons every dequantized element); llama.cpp loads it by default (check_tensors=false).
LinearDiscriminantAnalysis + QuadraticDiscriminantAnalysis (PMAT-892, Pillar-1, F-QDA-PARITY-001 / F-LDA-PARITY-004) — new estimators with scikit-learn predict-parity via a LAPACK-free per-class / pooled-covariance Cholesky fit.
cuda-oxide pure-Rust #[kernel] ports — RMSNorm + SwiGLU (PMAT-893/894, GB10 Blackwell sm_121) — bit-parity (cos=1.0) vs hand-PTX; RMSNorm beats hand-PTX 1.4–8.9×, SwiGLU a parity tie (migrate-free). Experiment harnesses; production promotion gated behind a 3-way parity gate.

🤖 Tagged autonomously per the beat-campaign release cadence. Follow-up: the v0.54 beat batch (gamma p-value finiteness #2194, f16-export RNE #2193) is already in the merge queue.

Assets 10

21 Jun 17:19

noahgift

v0.51.0

dcd131f

v0.51.0 — P0 .apr hotfix + PMAT-877..888 wave

[0.51.0] - 2026-06-21

Hotfix-driven release (brought forward from the Friday cadence by a P0). Each fix ships a named
proof-obligation + a RED-on-bug / GREEN-on-fix falsifier + a pv-validated contract.

Fixed

P0 — non-Gemma2 .apr inference produced garbage (PMAT-888, regressed in 0.50.0 via PMAT-810b)
— every non-Gemma2 .apr (qwen2/llama/mistral/phi/deepseek/qwen3 — the majority of models) generated
garbage on inference (CPU and GPU) while the same model as GGUF was coherent. PMAT-810b added a
Gemma2 post-attention-norm load keyed on the HF name post_attention_layernorm.weight — which is the
FFN norm for all those architectures — un-gated by architecture, so a spurious extra RMSNorm
was applied. Now gated on config.is_gemma2(), mirroring the GGUF loader. GGUF was never affected.
BatchNorm1d never updated running_mean/running_var (PMAT-877, Pillar-2) — they stayed at
init (0/1) forever, so eval-mode normalization was wrong vs PyTorch. Now EMA-updated each training
forward (running = (1-momentum)·running + momentum·batch).
Linear bias initialized to zeros (PMAT-878, Pillar-2) — PyTorch uses U(±1/√fan_in); now matches
(seed-deterministic).
LoRA dropout never applied (PMAT-879, Pillar-3) — LoRALayer::forward ignored the configured
dropout, so fine-tuning trained with zero regularization. Now applies dropout to the input
(y = Wx + s·B(A(dropout(x))), train-only), matching HF PEFT.
Batched-GPU GQA fail-closed (PMAT-880, Pillar-4) — attention_with_cache_gqa did not validate
kv_dim == num_kv_heads·head_dim/cache consistency, silently reading wrong memory on a corrupt config;
now returns a clear error (zero false-positives on valid models), where llama.cpp/Ollama run garbage.

Performance — GPU (Blackwell / GB10)

First pure-Rust cuda-oxide #[kernel] to BEAT hand-PTX (PMAT-882) — the incremental KV-cache
attention kernel: bit-exact (cos = 1.0) and 1.7–2.9× faster than the production hand-PTX kernel on
GB10 (true on-device A/B). FMA/softmax kernels are not DP4A-bound, so pure-Rust competes and wins.
Blackwell CUDA-graph replay fixed + re-enabled (PMAT-886a) — the default sm_121 Q4K GEMV variant
was not recorded into the manual graph, so graph replay dropped ~6 GEMVs/layer → stale buffers →
garbage (cosine 0.53). Now recorded; parity 0.53→0.9934 (== eager, token-for-token), graph decode
re-defaulted ON for Blackwell, +16% decode (96→112 tok/s).
Blackwell decode throughput-floor guard (PMAT-885) — a stale-binary / F2-false-fallback that
silently drops the GPU path to ~10 tok/s CPU is now a falsifiable invariant (≥100 tok/s on GB10).

Infrastructure

Pre-release Gate 11 (cargo publish -p aprender --dry-run) — catches the two classes that broke
the 0.50.0 cascade mid-publish (sibling path-deps missing a version; version-pinned sibling dev-deps
forming publish cycles) which cargo metadata does not detect.
Dogfood Gate 18 (fresh-convert .apr inference parity vs GGUF, CPU+GPU) — catches the PMAT-888
class that inspect/validate/tensors and a stale pre-existing .apr all pass through.

Assets 10

21 Jun 01:11

noahgift

v0.50.0

e3a28d6

v0.50.0 — 50 correctness beats (PMAT-827..876)

[0.50.0] - 2026-06-21

Fixed

Provable-correctness wave — fifty shipped-green correctness defects (PMAT-827..876),
each fixed with a named proof-obligation + a RED-on-bug / GREEN-on-fix falsifier + a
pv-validated contract. Spans all four pillars (replace+beat scikit-learn / PyTorch /
Unsloth / Ollama) plus eval/format/export and CI determinism. The first fifteen:

stats::incomplete_beta extra /a (PMAT-827, Pillar-1) — the regularized
incomplete beta was wrong for a != 1, so every t-test (df ≤ 30) and ANOVA F-test
p-value was too small (falsely significant). e.g. a one-sample t-test reported p=0.115
when scipy gives 0.230. Now matches scipy.special.betainc.
rsLoRA adapter scale dropped on load (PMAT-828, Pillar-3) — LoRAAdapter::to_layer
recomputed Standard alpha/rank and discarded the serialized rsLoRA alpha/sqrt(rank)
scale, silently re-scaling a saved adapter by sqrt(rank) (e.g. 4× at rank 16).
--grad-clip silent no-op on the CPU trainer (PMAT-829, Pillar-2) — clip_and_step
computed the clip coefficient then discarded it (let _ = scale); the optimizer stepped
on raw, unclipped gradients (divergence risk), while the WGPU path clipped correctly.
apr prune --sparsity over-pruned (PMAT-830) — sparsity.max(target_ratio) raised any
--sparsity below the 0.5 --target-ratio default, so --sparsity 0.3 zeroed 50% of
weights (not 30%) and the output metadata misreported the sparsity actually applied.
GradientBoostingClassifier::predict_proba saturated (PMAT-831, Pillar-1) — the weak
learner fit a classification tree to sign(residual) and added a fixed ±1 step instead of a
regression tree to the continuous residuals, so probabilities saturated to 0/1 (50/164 →
P=0.99998 vs the correct 0.75). Now uses a DecisionTreeRegressor (Friedman gradient step).
Q3_K GGUF dequant corrupted weights on import (PMAT-832) — the 6-bit super-block scales
were unpacked as 4-bit (offset −8 instead of −32) with the wrong quant/high-bit layout, so
~252/256 elements were wrong on any Q3_K_S/Q3_K_M model. Ported the correct GGML algorithm.
MoE / head_dim dropped on SafeTensors import (PMAT-833) — load_model_config_from_json
hardcoded num_experts/num_experts_per_tok/moe_intermediate_size/head_dim to None,
so a MoE model (Mixtral/Qwen3-MoE/DeepSeek) silently converted to a DENSE .apr, and an
explicit head_dim was lost (wrong RoPE/attention dims for Qwen3/Gemma2/Phi3).
ARIMA forecast wrong for d >= 2 (PMAT-834, Pillar-1) — reverse-differencing re-seeded
every un-differencing pass with y[n] instead of the matching intermediate difference, so
every forecast with two or more differencing orders overshot (e.g. 165 vs the correct 110).
apr eval pass@k inflated under single greedy sampling (PMAT-835) — the Chen et al.
estimator was fed the problem-count/solved-count in its per-sample (n, c) slots, so a model
solving 50/164 HumanEval reported pass@10=98% / pass@100=100% (correct: 30% for every k under
one deterministic sample) in the CI-consumed JSON. Now collapses to pass@1.
User __metadata__ dropped on every apr export (PMAT-836) — extract_user_metadata
read a fabricated APR v2 header layout (length @ byte 8, JSON @ 16) instead of the real
64-byte header (metadata_offset @ 12, JSON @ metadata_offset), always returning empty —
so the user's SafeTensors __metadata__ was silently lost on re-export.
GPT-2 byte-level BPE decode produced mojibake (PMAT-837, Pillar-4) — gpt2_char_to_byte
used a linear code − 0x100 offset instead of the GPT-2 byte_encoder staircase, so 129/256
bytes failed round-trip and all non-ASCII serve output was garbled (中 → ä¸Ń). Now delegates
to the correct unicode→byte map.
GLM IRLS swapped the link / inverse-link derivative (PMAT-838, Pillar-1) — the IRLS working
response and weights used Link::derivative (the inverse-link derivative dμ/dη) where the
link derivative dη/dμ is required, so coefficients were wrong for every non-identity link
(logistic slope 1.033 vs the correct 1.127). Now inverts it.
Gradient accumulation stepped on the SUM not the MEAN (PMAT-839, Pillar-2) — backward ops
accumulate into shared grad cells, but the trainer stepped without dividing by the accumulation
window, inflating the effective learning rate ×window (K-fold LR inflation / divergence). Now
scales grads by 1/window at the accumulation boundary.
cargo install aprender broke on macOS (PMAT-840) — configure_parent_death_signal used
libc::prctl(PR_SET_PDEATHSIG) under #[cfg(unix)], but that prctl form is Linux-only, so
aprender-orchestrate (a dependency of apr-cli) failed to compile on *-apple-darwin,
breaking the published binary for every macOS user. Now gated to #[cfg(target_os = "linux")].
Batched-GPU serving crashed on every GQA model (PMAT-841, Pillar-4) — batch_generate_gpu
dispatched ≥32-prompt batches into an MHA-only path that assumes QKV = 3 × hidden_dim, so
every grouped-query-attention model (Qwen2 / Llama-3 / Mistral) crashed with a CUDA GEMM size
mismatch (B expected 3·hidden·hidden). Now routes GQA through the per-prompt path.

The remaining thirty-five (PMAT-842..876), each with a falsifier + pv-validated contract:

Pillar-1 — scikit-learn parity: macro precision/recall/f1/jaccard/fbeta averaged over
max(label)+1 instead of present labels (844); silhouette_score scored singleton clusters
+1.0 instead of 0 (845); FastICA whitening matrix transposed → Cov(X_white) ≠ I (847);
Lasso/ElasticNet alpha ignored the 1/(2·n) loss normalization (848); Ward linkage used the
wrong Lance-Williams coefficient (849); tree/RandomForest feature_importances used raw sample
counts not impurity decrease/MDI (851); train_test_split used round not ceil for float
test_size (852); two-tailed t-test used a normal approximation for df>30 (853); Brandes
betweenness counted the source's own dependency (860); TfidfVectorizer omitted L2 row
normalization (861); ARIMA AR coefficients estimated on uncentered data (862); Bayesian-logistic
MAP converged to precision n·λ not λ (864); KNN tie-break used randomized HashMap order not
smallest-label (865); StratifiedKFold dumped every class remainder into the low folds (866);
isotonic regression interpolated inside pooled PAV blocks (870); Calinski-Harabasz/Davies-Bouldin
counted phantom empty clusters / not relabel-invariant (871).

Assets 10

13 Jun 10:09

noahgift

v0.49.1

e9a99ed

v0.49.1 — dependency refresh

v0.49.1 — dependency-refresh maintenance release

A pure dependency refresh with no source changes. All registry/transitive
dependencies brought to their latest semver-compatible versions; in-tree
trueno/realizar/sibling path crates untouched (APR-MONO self-contained DAG
preserved — no crates.io duplicates reintroduced).

Dependency bumps (26 packages)

Crate	From	To
wasm-bindgen (+ futures/test/macro/shared)	0.2.123	0.2.125
web-sys / js-sys	0.3.100	0.3.102
openssl / openssl-sys	0.10.80 / 0.9.116	0.10.81 / 0.9.117
zeroize / zeroize_derive	1.8.2 / 1.4.3	1.9.0 / 1.5.0
aws-sdk-s3 (+ sso/ssooidc/sts/runtime)	1.135.0	1.136.0
cc	1.2.63	1.2.64
fastembed	5.16.0	5.16.1
wasmparser / wasm-encoder / wast / wat	251	252
wasip2	1.0.3	1.0.4

Workspace version bumped 0.49.0 → 0.49.1 across all 145 crates.

Validation

✅ cargo fmt clean · cargo deny check advisories → no new advisories
✅ Contract lib tests 1406/0 · monorepo-invariants 8/8 (cross-crate version consistency) · readme-contract 10/10 · cli-commands 8/8
✅ cargo install --path . --features cuda → apr 0.49.1 (CUDA) installed & dogfooded
✅ Dogfood GO — 17-gate sweep, no panics / no silent fallbacks; bad-input gates fail loud; export round-trip 3/3 no-panic
✅ 7B Q4_K on RTX 4090: ALL GATES PASSED — Golden Output ✓, 152.9 tok/s, 1.1× Ollama (Grade B), GPU 15.7× CPU, GGUF↔SafeTensors format parity, PTX parity 6/6

Full changelog: https://github.com/paiml/aprender/blob/main/CHANGELOG.md

🤖 Generated with Claude Code

Assets 10

12 Jun 08:30

noahgift

v0.49.0

85b64ee

v0.49.0

Added

ContractKind::BeatBenchmark (PMAT-741) — the measurement backbone for the
four-pillar "replace AND beat" mission: a contract kind for committed, CI-wired
head-to-head BEAT claims (apr ≥ incumbent on a canonical task, pinned baseline,
fails CI on regression). Ships the pilot contracts/beat-sklearn-iris-v1.yaml.
Marks the campaign's pivot from sklearn-parity breadth to BEATS-as-CI-artifacts.

Assets 10

12 Jun 07:02

noahgift

v0.48.6

a4754d6

v0.48.6

Added

metrics::explained_variance_score + metrics::adjusted_rand_score (Pillar 1):
explained variance regression score (differs from R² under biased residuals) and
the chance-corrected Adjusted Rand Index for comparing clusterings — both matching
sklearn.metrics within 1e-4.

Assets 10

Uh oh!

Releases: paiml/aprender

Nightly Build

Uh oh!

v0.55.0 — runnable models + reconciled GPU parity + autograd training proven

v0.55.0 — runnable models, reconciled GPU parity, and the autograd training story proven

Correctness

Performance / GPU

Compatibility / Portability

Build / CI

Uh oh!

v0.54.0 — autograd graph complete (transformers end-to-end trainable) + numeric/quant correctness

Fixed

Added

Uh oh!

v0.53.0 — PMAT-904..911 correctness-beat wave (4 pillars)

Fixed

Added

Uh oh!

v0.52.0 — PMAT-889..898 correctness-beat wave (4 pillars + cuda-oxide)

Fixed

Added

Uh oh!

v0.51.0 — P0 .apr hotfix + PMAT-877..888 wave

[0.51.0] - 2026-06-21

Fixed

Performance — GPU (Blackwell / GB10)

Infrastructure

Uh oh!

v0.50.0 — 50 correctness beats (PMAT-827..876)

[0.50.0] - 2026-06-21

Fixed

Uh oh!

v0.49.1 — dependency refresh

v0.49.1 — dependency-refresh maintenance release

Dependency bumps (26 packages)

Validation

Uh oh!

v0.49.0

Added

Uh oh!

v0.48.6

Added

Uh oh!