Skip to content

[fs-backend] Track feature gaps in upstream vLLM fs_python secondary tier vs llmd_fs_backend #616

@kfirtoledo

Description

@kfirtoledo

Context

vLLM has upstreamed a pure-Python fs_python secondary tier for the OffloadingConnector
in vllm-project/vllm#41735 — a minimal
baseline equivalent of our kv_connectors/llmd_fs_backend. Before we can deprecate
llmd_fs_backend in favor of the upstream tier, the following gaps need to be either
upstreamed into vLLM or kept as an extension in this repo.

Reference

Gap checklist

  • 1. KV-event publishing — emits BlockStored events over ZMQ PUB in vLLM's msgpack wire format (event_publisher.py);
  • 2. Per-job metrics — llmd_fs_backend reports transfer_size, transfer_time, and throughput per completed transfer; upstream JobResult carries only (job_id, success).
  • 3. Partial file read/write (HMA) — lmd_fs_backend allows partial reads and writes to a file.
  • 4. Adaptive write-queue cap with EMA-based dropping — llmd_fs_backend caps queue depth at max_write_queued_seconds / EMA(write_latency) and drops excess writes (fix for llmd fs backend: EngineCore deadlock under SharedStorageOffloadingSpec + mp executor at high concurrency #457 deadlock, verified on L40S and H100); upstream's deques are unbounded.
  • 5. Independent storage vs CPU-tier block-size knob — llmd_fs_backend exposes a separate block_size for the storage tier and actually packs multiple GPU blocks per file;
  • 6. GDS support — llmd_fs_backend supports 7 modes (disabled, read_only, write_only, read_write, and bb_* bounce-buffer variants) via C++ GdsFileIO with POSIX fallback; upstream is POSIX-only with O_DIRECT.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions