Commit b52a15e
chore(release): bump workspace to 2.3.0
Covers the ruvllm GPU optimization sweep (ADR-258 + post-merge):
- RDT / OpenMythos model (PR #589)
- Vectorized ACT halting — 4-21× GPU prefill speedup
- candle 0.9 + cudarc 0.19 (CUDA 13.0 native, RTX 5080 / SM 12.0)
- KV cache pre-allocation (GqaPrealloc, MlaPrealloc, RdtKvCache::Prealloc)
- On-device argmax (128KB→4B), GPU top-k sort (128KB→320B)
- Fused ACT CUDA kernel via nvrtc + zero-copy tensor pointer path
- True per-token streaming, RDT generate_sampled
Co-Authored-By: claude-flow <ruv@ruv.net>1 parent 920d8cc commit b52a15e
2 files changed
Lines changed: 132 additions & 132 deletions
0 commit comments