Releases: paiml/aprender
Nightly Build
Automated nightly build from main.
Date: 2026-06-25T23:03:49Z
Commit: 11005ba
This is a prerelease. For stable releases, see the latest tagged version.
v0.55.0 — runnable models + reconciled GPU parity + autograd training proven
v0.55.0 — runnable models, reconciled GPU parity, and the autograd training story proven
A correctness + portability wave. The headline pair: (1) apr convert/apr export now produce runnable models for tied-embedding architectures; (2) an end-to-end training proof caught that the transformer FFN was still severing the autograd graph after the v0.53/v0.54 "complete" sweep — per-layer gradchecks never saw it; a real train-to-loss test did. Plus the Blackwell GPU/CPU parity gate reconciled against ground truth, an Ollama HTTP drop-in, and a real cross-silicon portability crash fix. Every fix ships a named proof-obligation + a mutation-verified RED-on-bug/GREEN-on-fix falsifier + a pv-validated contract.
Correctness
apr convert --quantize q4kproduced a non-runnable.aprfor tied-embedding models (PMAT-918, #2209) — the Q4K save path never synthesized the tiedlm_head(the f32 path did), so the model failed to load. Tie-synthesis hoisted before quant dispatch; verified runnable end-to-end on Blackwell.apr export --format ggufsilently mis-inferrednum_headson metadata-light.apr(PMAT-920, #2212) — a first-divisor guess would stamp e.g. Qwen2-1.5B as 24 heads (true 12). Now uses the explicithead_dimfor exactnum_heads, and hard-fails with an actionable error (no GGUF written) when genuinely absent — never a silently-wrong model.- GPU/CPU parity gate falsely rejected the correct Blackwell kernel (PMAT-919, #2210) — reconciled against ground truth (llama.cpp + CPU-Q4K, per-position, on 1.5B/7B/8B): fp32-
Mwvis the correct Blackwell default;HwDp4ais genuinely degraded (INT8-activation quant). The F2 gate now checks per-position argmax-match + min-cosine over positions ≥1, replacing the last-token-only check that let a degraded kernel pass. On-device verified on Ada (4090) + Blackwell; 7B serves coherently on GPU. - Autograd: the transformer FFN
gelusevered the graph (PMAT-921, #2213) —TransformerEncoderLayer's FFN built output viaTensor::from_vec(nograd_fn), freezingffn.linear1+norm2γ/β in every real training run while isolated gradchecks stayed green. Caught by a new end-to-end train-to-loss test (loss 3.565 → 1.4e-5, every param group updates). The proof per-layer gradchecks can't give.
Performance / GPU
- cuda-oxide RoPE kernel (#2215) — adjacent-pair RoPE ported to a pure-Rust
#[kernel]; on-device A/B on GB10 Blackwell (sm_121): bit-exact (cos=1.0) and a clean tie with hand-PTX at the DRAM-bandwidth roofline ⟹ migrate-for-free off the hand-PTX + Blackwell-JIT path. Fourth GO kernel (attention, RMSNorm, SwiGLU, RoPE).
Compatibility / Portability
- Ollama
/api/chat+/api/generate(PMAT-923, #2216) — wired into everyapr serverouter (APR/GGUF/SafeTensors/WGPU), makingapr servea drop-in Ollama HTTP target for non-streaming clients (NDJSON streaming is a documented follow-up). - wgpu adapter-enumeration SIGABRT on Linux/AMD-RADV (PMAT-925, #2217) —
enumerate_adapters(Backends::all())instantiated a GLES/EGL adapter whoseDroppanics → process abort. Constrained enumeration toBackends::PRIMARY(never GLES); cross-silicon verified on AMD-Vulkan (RADV) + Apple-Metal, no regression. Found by the 4-corner silicon-matrix verification.
Build / CI
- Gated the duckdb competitive bench (#2208) and the
coop_gemm_benchwgpu-27 example (#2211) behind features — eliminates the merge_group cold-build flake and the--all-targetsbreak.
Crates.io publish is handled separately.
v0.54.0 — autograd graph complete (transformers end-to-end trainable) + numeric/quant correctness
Correctness-beat wave (PMAT-913..917) — the headline completes the autograd severed-graph sweep:
following the v0.53.0 norm-backward fixes, the Embedding / pooling / attention layers were also
building their forward output via Tensor::from_vec / Tensor::new (a leaf with no grad_fn), so
their parameters + input received zero gradient. With these fixes the full transformer (and CNN)
autograd graph is intact end-to-end — transformers are now genuinely fine-tunable. Plus numerical
(f32→f64), loss, and quantization correctness. Each fix ships a named proof-obligation + an
adversarially-mutation-verified RED-on-bug / GREEN-on-fix falsifier + a pv-validated contract.
Fixed
- Autograd: attention backward was severed — Q/K/V got no gradient (PMAT-914, Pillar-2,
OBLIG-ATTENTION-BACKWARD-GRAD-FLOW) — the scaled-dot-product attention core (batched-matmul-4D,
transpose_last_two,softmax_last_dim, head reshape) built its intermediates viaTensor::new,
severing the chain soq_proj.weight.grad == None. This was the last link: despite the v0.53.0
norm fixes + the Embedding/pool fixes below, transformers were still NOT end-to-end trainable. Added
5 grad_fns (softmax Jacobian, batched-matmul, transpose/reshape) — attention now flows gradient,
finite-diff gradcheck-verified. - Autograd: Embedding / Flatten / MaxPool / AvgPool backward were severed (PMAT-913, Pillar-2,
OBLIG-{EMBEDDING,FLATTEN,MAXPOOL1D,MAXPOOL2D,AVGPOOL2D,GLOBALAVGPOOL2D}-BACKWARD-GRAD-FLOW) — all six
built output viaTensor::new; a severed Embedding meant token embeddings were non-trainable.
Added each backward (Embedding scatter-ADD, pool argmax/area routing, Flatten reshape); 8 gradchecks. BCEWithLogitsLoss(pos_weight)weighted the whole loss instead of the positive term (PMAT-915,
Pillar-2,OBLIG-BCE-POSWEIGHT-PYTORCH-PARITY) — coincided with PyTorch only for hard 0/1 targets
(which every existing test used), so on soft targets the loss diverged (1.096 vs torch 1.038). Now
matchestorch.nn.BCEWithLogitsLoss.- StandardScaler + PCA accumulated mean/variance/covariance in f32 (PMAT-916, Pillar-1,
OBLIG-{SCALER,PCA}-F64-ACCUM) — catastrophic cancellation on large-magnitude data: StandardScaler
std was ~75× wrong and PCA explained-variance ~10000× wrong vs the numpy/sklearn f64 reference. Now
reduces in f64 (stored as f32; public API unchanged).
Added
- Quantization round-trip fidelity gate (Q4_K / Q5_K / Q6_K) (PMAT-917, Pillar-4,
OBLIG-QUANT-ROUNDTRIP-FIDELITY) — a standing contract + falsifier pinning thatquantize→dequantize
reconstruction stays within the per-scheme affine half-step error bound (err/bound ratios 0.46–0.69;
mutation-verified — halving the scale or dropping the block offset trips it). Supports the
"provably-correct dequant" pillar: a future quant regression is now caught a-priori.
v0.53.0 — PMAT-904..911 correctness-beat wave (4 pillars)
Correctness-beat wave (PMAT-904..911) across all four pillars — the headline is the autograd
norm-backward family (LayerNorm / RMSNorm / BatchNorm1d / GroupNorm), which makes every
normalization-using transformer and CNN fine-tunable (their affine γ/β had been receiving zero
gradient). Each fix ships a named proof-obligation + a RED-on-bug / GREEN-on-fix falsifier
(adversarially mutation-verified) + a pv-validated contract.
Fixed
- Autograd: LayerNorm + RMSNorm backward severed the affine gradient (PMAT-907, Pillar-2,
OBLIG-{LAYERNORM,RMSNORM}-BACKWARD-GRAD-FLOW) —nn::functional::layer_norm/rms_normbuilt
their output viaTensor::from_vec(a leaf with nograd_fn), so afterbackward()the scale γ,
shift β, and input x all received zero gradient. Every transformer using these norms was
non-fine-tunable. AddedLayerNormBackward/RmsNormBackwardwith correct dγ/dβ/dx; gradients now
match a finite-difference gradcheck. - Autograd: BatchNorm1d + GroupNorm backward severed the affine gradient (PMAT-911, Pillar-2,
OBLIG-{BATCHNORM1D,GROUPNORM}-BACKWARD-GRAD-FLOW) — same severed-graph bug as PMAT-907 for the
remaining norms (train-mode BatchNorm batch-stat backward + per-group GroupNorm backward). Completes
the norm family: all four norms now flow gradient to γ/β. CrossEntropyLosslabel smoothing distributed the off-target mass as(1-eps)/C(PMAT-910,
Pillar-2,OBLIG-CE-LABEL-SMOOTHING-UNIFORM-MASS) — should beeps/Cper non-target class
(so q_target = 1-eps+eps/C). On eps=0.1/C=5 the smoothed loss was 3.29× too large (2.384 vs the
0.7244 PyTorch/analytic value). Now matchestorch.nn.CrossEntropyLoss(label_smoothing=...).- t-test / ANOVA / chi-square returned NaN p-values for df ≳ 72 (PMAT-904, Pillar-1,
OBLIG-{CHISQUARE,HYPOTHESIS}-PVALUE-FINITE) — a raw-space Lanczosgamma()overflowed f32 at
z ≥ ~36, so the incomplete-gamma/beta prefactors went Inf/Inf = NaN. Rebuilt in log-space
(ln_gamma+ single boundedexp); p-values now finite + match scipy within 1e-5. - f16 export truncated instead of round-to-nearest-even (PMAT-905, Pillar-4,
OBLIG-SAFETENSORS-F16-EXPORT-RNE) —f32_slice_to_f16_bytesdropped the low 13 mantissa bits and
flushed the entire subnormal range to ±0, diverging fromhalf::f16::from_f32(e.g. 65520.0 stayed
finite instead of rounding up to +Inf; 2^-24 became 0 instead of the smallest subnormal). Now true
RNE across normal + subnormal grids (the F16 sibling of the PMAT-859 BF16 fix). - Weighted KNN capped a zero-distance neighbor at weight 1.0 (PMAT-909, Pillar-1,
OBLIG-KNN-WEIGHTED-ZERO-DISTANCE) — sklearnweights="distance"gives an exact-duplicate neighbor
infinite weight (only the zero-distance neighbors vote); apr let farther neighbors outvote the exact
match, flipping predictions. Now matches sklearn.
Added
- Fail-closed: reject special-token id ≥ vocab_size at load (PMAT-908, Pillar-4,
OBLIG-SPECIAL-TOKEN-WITHIN-VOCAB) — a config whose eos/bos id is ≥ vocab_size loaded silently; the
stop token is then an unreachable logit so generation never stops. Now rejected with an actionable
error (and the arch-default EOS fallback no longer injects an out-of-vocab id into a small-vocab
model). llama.cpp/Ollama load these silently. - Fail-closed: reject APR config↔tensor shape mismatch at load (PMAT-906, Pillar-4,
OBLIG-APR-{VOCAB-EMBED-CONSISTENT,WEIGHT-SHAPE-MATCHES-CONFIG}) —AprV2Model::from_model_data
accepted a model whose declaredvocab_size/hidden_sizedisagreed with the embedding/lm_head
tensor shape (garbage / OOB at inference). Now rejected fail-closed.
v0.52.0 — PMAT-889..898 correctness-beat wave (4 pillars + cuda-oxide)
Correctness-beat wave (PMAT-889..898) across all four pillars + the cuda-oxide marquee. Each fix ships a named proof-obligation + a RED-on-bug / GREEN-on-fix falsifier + a pv-validated contract.
Fixed
GaussianNBvar_smoothingdiverged from scikit-learn (PMAT-890, Pillar-1,F-GAUSSIANNB-EPSILON-003) — added a raw1e-9instead ofvar_smoothing · max(var across features); on mixed-scale data the smoothed variance was thousands of times too small (able to flippredict). Now matches sklearn'sepsilon = var_smoothing · X.var(axis=0).max().CrossEntropyLossbackward ignored the reduction mode (PMAT-891, Pillar-2,F-AUTOGRAD-CE-REDUCTION-001) — always divided the gradient by batch, soSum-reduction grads werebatch×too small (learns at 1/batch the intended rate) andNonemis-broadcast. Now: Mean/batch, Sum no/batch, None per-sample.L1Lossbackward was severed (PMAT-896, Pillar-2,F-L1LOSS-BACKWARD-GRAD-001) —loss.backward()produced no gradient (silent zero-learning);abs()built its result without agrad_fn. AddedAbsBackward(d|x|/dx = sign(x)).apr merge --method lora-adaptermis-merged PEFT/Unsloth adapters (PMAT-897, Pillar-3,F-LORA-MERGE-RSLORA-001+F-LORA-MERGE-ADAPTER-DTYPE-001) — ignoreduse_rslora(applied scale1.0instead ofalpha/√rank) and decoded BF16/FP16 adapter tensors as hardcoded f32 (garbage). Now honorsuse_rsloraand threads per-tensor dtype.- SGD-with-momentum diverged from PyTorch under an LR schedule (PMAT-898, Pillar-2,
F-SGD-MOMENTUM-LRSCHED-001) — baked the learning rate into the velocity buffer, so a mid-trainingset_lrused a stale lr (~40% off). Now stores the unscaled buffer and applies lr fresh each step (scalar + SIMD paths).
Added
- Fail-closed: reject a dead output row (PMAT-889, Pillar-4,
F-DATA-QUALITY-007) —apr validatenow rejects a model with a fully-zerolm_head/embed output row (a structurally-unreachable logit) that llama.cpp/Ollama silently load+run. - Fail-closed: reject NaN/Inf quantized weights at load (PMAT-895, Pillar-4,
OBLIG-GGUF-LOAD-NANINF) —OwnedQuantizedModel::from_mappednow rejects a Q4_0/Q4_K block whose f16 scale is NaN/+Inf (poisons every dequantized element); llama.cpp loads it by default (check_tensors=false). LinearDiscriminantAnalysis+QuadraticDiscriminantAnalysis(PMAT-892, Pillar-1,F-QDA-PARITY-001/F-LDA-PARITY-004) — new estimators with scikit-learn predict-parity via a LAPACK-free per-class / pooled-covariance Cholesky fit.- cuda-oxide pure-Rust
#[kernel]ports — RMSNorm + SwiGLU (PMAT-893/894, GB10 Blackwell sm_121) — bit-parity (cos=1.0) vs hand-PTX; RMSNorm beats hand-PTX 1.4–8.9×, SwiGLU a parity tie (migrate-free). Experiment harnesses; production promotion gated behind a 3-way parity gate.
🤖 Tagged autonomously per the beat-campaign release cadence. Follow-up: the v0.54 beat batch (gamma p-value finiteness #2194, f16-export RNE #2193) is already in the merge queue.
v0.51.0 — P0 .apr hotfix + PMAT-877..888 wave
[0.51.0] - 2026-06-21
Hotfix-driven release (brought forward from the Friday cadence by a P0). Each fix ships a named
proof-obligation + a RED-on-bug / GREEN-on-fix falsifier + a pv-validated contract.
Fixed
- P0 — non-Gemma2
.aprinference produced garbage (PMAT-888, regressed in 0.50.0 via PMAT-810b)
— every non-Gemma2.apr(qwen2/llama/mistral/phi/deepseek/qwen3 — the majority of models) generated
garbage on inference (CPU and GPU) while the same model as GGUF was coherent. PMAT-810b added a
Gemma2 post-attention-norm load keyed on the HF namepost_attention_layernorm.weight— which is the
FFN norm for all those architectures — un-gated by architecture, so a spurious extra RMSNorm
was applied. Now gated onconfig.is_gemma2(), mirroring the GGUF loader. GGUF was never affected. BatchNorm1dnever updatedrunning_mean/running_var(PMAT-877, Pillar-2) — they stayed at
init (0/1) forever, so eval-mode normalization was wrong vs PyTorch. Now EMA-updated each training
forward (running = (1-momentum)·running + momentum·batch).Linearbias initialized to zeros (PMAT-878, Pillar-2) — PyTorch usesU(±1/√fan_in); now matches
(seed-deterministic).- LoRA dropout never applied (PMAT-879, Pillar-3) —
LoRALayer::forwardignored the configured
dropout, so fine-tuning trained with zero regularization. Now applies dropout to the input
(y = Wx + s·B(A(dropout(x))), train-only), matching HF PEFT. - Batched-GPU GQA fail-closed (PMAT-880, Pillar-4) —
attention_with_cache_gqadid not validate
kv_dim == num_kv_heads·head_dim/cache consistency, silently reading wrong memory on a corrupt config;
now returns a clear error (zero false-positives on valid models), where llama.cpp/Ollama run garbage.
Performance — GPU (Blackwell / GB10)
- First pure-Rust cuda-oxide
#[kernel]to BEAT hand-PTX (PMAT-882) — the incremental KV-cache
attention kernel: bit-exact (cos = 1.0) and 1.7–2.9× faster than the production hand-PTX kernel on
GB10 (true on-device A/B). FMA/softmax kernels are not DP4A-bound, so pure-Rust competes and wins. - Blackwell CUDA-graph replay fixed + re-enabled (PMAT-886a) — the default sm_121 Q4K GEMV variant
was not recorded into the manual graph, so graph replay dropped ~6 GEMVs/layer → stale buffers →
garbage (cosine 0.53). Now recorded; parity 0.53→0.9934 (== eager, token-for-token), graph decode
re-defaulted ON for Blackwell, +16% decode (96→112 tok/s). - Blackwell decode throughput-floor guard (PMAT-885) — a stale-binary / F2-false-fallback that
silently drops the GPU path to ~10 tok/s CPU is now a falsifiable invariant (≥100 tok/s on GB10).
Infrastructure
- Pre-release Gate 11 (
cargo publish -p aprender --dry-run) — catches the two classes that broke
the 0.50.0 cascade mid-publish (sibling path-deps missing aversion; version-pinned sibling dev-deps
forming publish cycles) whichcargo metadatadoes not detect. - Dogfood Gate 18 (fresh-convert
.aprinference parity vs GGUF, CPU+GPU) — catches the PMAT-888
class thatinspect/validate/tensorsand a stale pre-existing.aprall pass through.
v0.50.0 — 50 correctness beats (PMAT-827..876)
[0.50.0] - 2026-06-21
Fixed
Provable-correctness wave — fifty shipped-green correctness defects (PMAT-827..876),
each fixed with a named proof-obligation + a RED-on-bug / GREEN-on-fix falsifier + a
pv-validated contract. Spans all four pillars (replace+beat scikit-learn / PyTorch /
Unsloth / Ollama) plus eval/format/export and CI determinism. The first fifteen:
stats::incomplete_betaextra/a(PMAT-827, Pillar-1) — the regularized
incomplete beta was wrong fora != 1, so every t-test (df ≤ 30) and ANOVA F-test
p-value was too small (falsely significant). e.g. a one-sample t-test reported p=0.115
when scipy gives 0.230. Now matchesscipy.special.betainc.- rsLoRA adapter scale dropped on load (PMAT-828, Pillar-3) —
LoRAAdapter::to_layer
recomputed Standardalpha/rankand discarded the serialized rsLoRAalpha/sqrt(rank)
scale, silently re-scaling a saved adapter bysqrt(rank)(e.g. 4× at rank 16). --grad-clipsilent no-op on the CPU trainer (PMAT-829, Pillar-2) —clip_and_step
computed the clip coefficient then discarded it (let _ = scale); the optimizer stepped
on raw, unclipped gradients (divergence risk), while the WGPU path clipped correctly.apr prune --sparsityover-pruned (PMAT-830) —sparsity.max(target_ratio)raised any
--sparsitybelow the 0.5--target-ratiodefault, so--sparsity 0.3zeroed 50% of
weights (not 30%) and the output metadata misreported the sparsity actually applied.GradientBoostingClassifier::predict_probasaturated (PMAT-831, Pillar-1) — the weak
learner fit a classification tree tosign(residual)and added a fixed ±1 step instead of a
regression tree to the continuous residuals, so probabilities saturated to 0/1 (50/164 →
P=0.99998 vs the correct 0.75). Now uses aDecisionTreeRegressor(Friedman gradient step).- Q3_K GGUF dequant corrupted weights on import (PMAT-832) — the 6-bit super-block scales
were unpacked as 4-bit (offset −8 instead of −32) with the wrong quant/high-bit layout, so
~252/256 elements were wrong on any Q3_K_S/Q3_K_M model. Ported the correct GGML algorithm. - MoE /
head_dimdropped on SafeTensors import (PMAT-833) —load_model_config_from_json
hardcodednum_experts/num_experts_per_tok/moe_intermediate_size/head_dimtoNone,
so a MoE model (Mixtral/Qwen3-MoE/DeepSeek) silently converted to a DENSE.apr, and an
explicithead_dimwas lost (wrong RoPE/attention dims for Qwen3/Gemma2/Phi3). - ARIMA forecast wrong for
d >= 2(PMAT-834, Pillar-1) — reverse-differencing re-seeded
every un-differencing pass withy[n]instead of the matching intermediate difference, so
every forecast with two or more differencing orders overshot (e.g. 165 vs the correct 110). apr evalpass@k inflated under single greedy sampling (PMAT-835) — the Chen et al.
estimator was fed the problem-count/solved-count in its per-sample(n, c)slots, so a model
solving 50/164 HumanEval reported pass@10=98% / pass@100=100% (correct: 30% for every k under
one deterministic sample) in the CI-consumed JSON. Now collapses to pass@1.- User
__metadata__dropped on everyapr export(PMAT-836) —extract_user_metadata
read a fabricated APR v2 header layout (length @ byte 8, JSON @ 16) instead of the real
64-byte header (metadata_offset@ 12, JSON @metadata_offset), always returning empty —
so the user's SafeTensors__metadata__was silently lost on re-export. - GPT-2 byte-level BPE decode produced mojibake (PMAT-837, Pillar-4) —
gpt2_char_to_byte
used a linearcode − 0x100offset instead of the GPT-2byte_encoderstaircase, so 129/256
bytes failed round-trip and all non-ASCII serve output was garbled (中 →ä¸Ń). Now delegates
to the correct unicode→byte map. - GLM IRLS swapped the link / inverse-link derivative (PMAT-838, Pillar-1) — the IRLS working
response and weights usedLink::derivative(the inverse-link derivativedμ/dη) where the
link derivativedη/dμis required, so coefficients were wrong for every non-identity link
(logistic slope 1.033 vs the correct 1.127). Now inverts it. - Gradient accumulation stepped on the SUM not the MEAN (PMAT-839, Pillar-2) — backward ops
accumulate into shared grad cells, but the trainer stepped without dividing by the accumulation
window, inflating the effective learning rate ×window (K-fold LR inflation / divergence). Now
scales grads by1/windowat the accumulation boundary. cargo install aprenderbroke on macOS (PMAT-840) —configure_parent_death_signalused
libc::prctl(PR_SET_PDEATHSIG)under#[cfg(unix)], but that prctl form is Linux-only, so
aprender-orchestrate(a dependency ofapr-cli) failed to compile on*-apple-darwin,
breaking the published binary for every macOS user. Now gated to#[cfg(target_os = "linux")].- Batched-GPU serving crashed on every GQA model (PMAT-841, Pillar-4) —
batch_generate_gpu
dispatched ≥32-prompt batches into an MHA-only path that assumesQKV = 3 × hidden_dim, so
every grouped-query-attention model (Qwen2 / Llama-3 / Mistral) crashed with a CUDA GEMM size
mismatch (B expected 3·hidden·hidden). Now routes GQA through the per-prompt path.
The remaining thirty-five (PMAT-842..876), each with a falsifier + pv-validated contract:
Pillar-1 — scikit-learn parity: macro precision/recall/f1/jaccard/fbeta averaged over
max(label)+1 instead of present labels (844); silhouette_score scored singleton clusters
+1.0 instead of 0 (845); FastICA whitening matrix transposed → Cov(X_white) ≠ I (847);
Lasso/ElasticNet alpha ignored the 1/(2·n) loss normalization (848); Ward linkage used the
wrong Lance-Williams coefficient (849); tree/RandomForest feature_importances used raw sample
counts not impurity decrease/MDI (851); train_test_split used round not ceil for float
test_size (852); two-tailed t-test used a normal approximation for df>30 (853); Brandes
betweenness counted the source's own dependency (860); TfidfVectorizer omitted L2 row
normalization (861); ARIMA AR coefficients estimated on uncentered data (862); Bayesian-logistic
MAP converged to precision n·λ not λ (864); KNN tie-break used randomized HashMap order not
smallest-label (865); StratifiedKFold dumped every class remainder into the low folds (866);
isotonic regression interpolated inside pooled PAV blocks (870); Calinski-Harabasz/Davies-Bouldin
counted phantom empty clusters / not relabel-invariant (871).
v0.49.1 — dependency refresh
v0.49.1 — dependency-refresh maintenance release
A pure dependency refresh with no source changes. All registry/transitive
dependencies brought to their latest semver-compatible versions; in-tree
trueno/realizar/sibling path crates untouched (APR-MONO self-contained DAG
preserved — no crates.io duplicates reintroduced).
Dependency bumps (26 packages)
| Crate | From | To |
|---|---|---|
| wasm-bindgen (+ futures/test/macro/shared) | 0.2.123 | 0.2.125 |
| web-sys / js-sys | 0.3.100 | 0.3.102 |
| openssl / openssl-sys | 0.10.80 / 0.9.116 | 0.10.81 / 0.9.117 |
| zeroize / zeroize_derive | 1.8.2 / 1.4.3 | 1.9.0 / 1.5.0 |
| aws-sdk-s3 (+ sso/ssooidc/sts/runtime) | 1.135.0 | 1.136.0 |
| cc | 1.2.63 | 1.2.64 |
| fastembed | 5.16.0 | 5.16.1 |
| wasmparser / wasm-encoder / wast / wat | 251 | 252 |
| wasip2 | 1.0.3 | 1.0.4 |
Workspace version bumped 0.49.0 → 0.49.1 across all 145 crates.
Validation
- ✅
cargo fmtclean ·cargo deny check advisories→ no new advisories - ✅ Contract lib tests 1406/0 · monorepo-invariants 8/8 (cross-crate version consistency) · readme-contract 10/10 · cli-commands 8/8
- ✅
cargo install --path . --features cuda→apr 0.49.1(CUDA) installed & dogfooded - ✅ Dogfood GO — 17-gate sweep, no panics / no silent fallbacks; bad-input gates fail loud; export round-trip 3/3 no-panic
- ✅ 7B Q4_K on RTX 4090: ALL GATES PASSED — Golden Output ✓, 152.9 tok/s, 1.1× Ollama (Grade B), GPU 15.7× CPU, GGUF↔SafeTensors format parity, PTX parity 6/6
Full changelog: https://github.com/paiml/aprender/blob/main/CHANGELOG.md
🤖 Generated with Claude Code
v0.49.0
Added
ContractKind::BeatBenchmark(PMAT-741) — the measurement backbone for the
four-pillar "replace AND beat" mission: a contract kind for committed, CI-wired
head-to-head BEAT claims (apr ≥ incumbent on a canonical task, pinned baseline,
fails CI on regression). Ships the pilotcontracts/beat-sklearn-iris-v1.yaml.
Marks the campaign's pivot from sklearn-parity breadth to BEATS-as-CI-artifacts.
v0.48.6
Added
metrics::explained_variance_score+metrics::adjusted_rand_score(Pillar 1):
explained variance regression score (differs from R² under biased residuals) and
the chance-corrected Adjusted Rand Index for comparing clusterings — both matching
sklearn.metricswithin 1e-4.