Skip to content

Add HMA support to FS connector via vllm offloading connector HMA support #472

@kfirtoledo

Description

@kfirtoledo

Summary

Add HMA (Hybrid Memory Architecture) support to the llmd-fs-backend by adopting the new vllm offloading connector HMA interfaces.

This enables KV cache offloading for models with multiple KV cache groups, such as full attention + sliding window + Mamba (e.g., Jamba, Zamba, Command-A).

Related to vllm-project/vllm#33689 (KV Offloading Roadmap — HMA Support).

What needs to happen

  • Adopt new vllm HMA interfaces: CanonicalKVCaches, OffloadKey, updated OffloadingManager API
  • Update FS connector worker, manager, spec, and mediums modules
  • Update tests

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions