Commit 98cc1af

committed

feat: Add streaming generation, predictive expert prefetcher, and compressed MLA KV cache

- Streaming generation API (generate_streaming) with per-token callback, early stopping, and GenerationStats for throughput metrics - ExpertPredictor: transition-matrix based predictor that learns from routing history to predict next experts with Laplace smoothing - CompressedMlaCache: stores compressed latents (c_kv + k_pe) instead of full K/V, achieving ~17.8x memory reduction for GLM-4.7-Flash - 15 new tests (203 total bitnet tests, all passing) https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK

1 parent 8093376 commit 98cc1afCopy full SHA for 98cc1af

2 files changed

crates/ruvllm/src/bitnet
- backend.rs
- mod.rs

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 98cc1af

File tree

0 commit comments