Summary
Add HMA (Hybrid Memory Architecture) support to the llmd-fs-backend by adopting the new vllm offloading connector HMA interfaces.
This enables KV cache offloading for models with multiple KV cache groups, such as full attention + sliding window + Mamba (e.g., Jamba, Zamba, Command-A).
Related to vllm-project/vllm#33689 (KV Offloading Roadmap — HMA Support).
What needs to happen
- Adopt new vllm HMA interfaces:
CanonicalKVCaches, OffloadKey, updated OffloadingManager API
- Update FS connector worker, manager, spec, and mediums modules
- Update tests
Summary
Add HMA (Hybrid Memory Architecture) support to the llmd-fs-backend by adopting the new vllm offloading connector HMA interfaces.
This enables KV cache offloading for models with multiple KV cache groups, such as full attention + sliding window + Mamba (e.g., Jamba, Zamba, Command-A).
Related to vllm-project/vllm#33689 (KV Offloading Roadmap — HMA Support).
What needs to happen
CanonicalKVCaches,OffloadKey, updatedOffloadingManagerAPI