Context
vLLM has upstreamed a pure-Python fs_python secondary tier for the OffloadingConnector
in vllm-project/vllm#41735 — a minimal
baseline equivalent of our kv_connectors/llmd_fs_backend. Before we can deprecate
llmd_fs_backend in favor of the upstream tier, the following gaps need to be either
upstreamed into vLLM or kept as an extension in this repo.
Reference
Gap checklist
Context
vLLM has upstreamed a pure-Python
fs_pythonsecondary tier for theOffloadingConnectorin vllm-project/vllm#41735 — a minimal
baseline equivalent of our
kv_connectors/llmd_fs_backend. Before we can deprecatellmd_fs_backend in favor of the upstream tier, the following gaps need to be either
upstreamed into vLLM or kept as an extension in this repo.
Reference
vllm/v1/kv_offload/tiering/fs/{manager,io,thread_pool}.py,vllm/v1/kv_offload/file_mapper.pykv_connectors/llmd_fs_backend/Gap checklist
BlockStoredevents over ZMQ PUB in vLLM's msgpack wire format (event_publisher.py);transfer_size,transfer_time, and throughput per completed transfer; upstreamJobResultcarries only(job_id, success).max_write_queued_seconds / EMA(write_latency)and drops excess writes (fix for llmd fs backend: EngineCore deadlock under SharedStorageOffloadingSpec + mp executor at high concurrency #457 deadlock, verified on L40S and H100); upstream's deques are unbounded.block_sizefor the storage tier and actually packs multiple GPU blocks per file;disabled,read_only,write_only,read_write, andbb_*bounce-buffer variants) via C++GdsFileIOwith POSIX fallback; upstream is POSIX-only withO_DIRECT.