llmd fsconnector metadata cache by saikat-royc · Pull Request #621 · llm-d/llm-d-kv-cache

saikat-royc · 2026-05-29T22:08:35Z

1. Overview

This PR implements a client-side metadata cache for llmd_fs_backend to replace direct filesystem calls ( os.path.exists ) during the scheduler's lookup phase. Such a cache can help reduce the lookup latency stemming from the underlying storage system at scale. To address the eventual consistency challenges introduced by the asynchronous external eviction of files (e.g pvc_evictor), the cache implements a hard Time-To-Live (TTL) positive expiration policy.

2. Key Design and Features

A. Tiered positive Caching & Filesystem Fallback ( metadata_cache.py , manager.py )

• Metadata Cache Layer: Introduces an in-memory MetadataCache utilizing an OrderedDict structure to index block keys confirmed to exist on the filesystem.
• Fallback Resolution Workflow:
• Lookups check the cache first (Tier 1 hit).
• On a cache miss, the manager falls back to physical verification checks ( os.path.exists ). If confirmed, it back-fills the positive cache to accelerate subsequent queries.
• Stateless I/O and Safety: Writes (prepare/complete store actions) insert records back into the cache. Load paths bypass the cache entirely and read directly from physical layout paths, preventing stale lookup anomalies from corrupting active reads.

B. Bounded Hard TTL Expiration Strategy

• Eventual Consistency Window: Stored positive records save a monotonic timestamp ( time.monotonic() ). In contains and batch_contains queries, entries exceeding the configured lifespan are automatically popped, returning a Cache Miss (resulting in a fallback to filesystem tier)
• Hard TTL Boundaries: To prevent long-lived hot keys from extending their expiration window indefinitely (potentially hiding files deleted by an external evictor), subsequent insertion updates on pre-existing keys preserve their original timestamp instead of resetting or extending their lifetimes.
• Infinite TTL ( -1 ): Allows configuring metadata_cache_ttl_secs = -1 to disable time-bound expiration entirely, keeping positive keys cached forever (subject to standard LRU size boundaries) for use when the background evictor is completely disabled. This is suitable for a setup where we do not have any external eviction

C. Prometheus Metrics Instrumentation ( metrics.py , manager.py , metadata_cache.py )

• vllm_llmd_fs_metadata_cache_lookup_duration_seconds (Histogram): Tracks single-key metadata lookup latency metrics.
• vllm_llmd_fs_metadata_cache_lookup_blocks (Counter labeled result=["mem_hit", "fs_hit", "fs_miss"] ): Categorizes manager lookup step outcomes.
• vllm_llmd_fs_metadata_cache_entries (Gauge): Monitors positive cache in-memory capacity.
• vllm_llmd_fs_metadata_cache_evictions (Counter labeled type=["lru", "ttl"] ): Differentiates capacity LRU pop evictions from time-bound dynamic TTL prunes.

Verification:

Inference perf benchmarks with the mtadata cache enabled
unit tests

fix CUDA version mismatch and dev headers symlink - Update default CUDA_TOOLKIT_PKG to cuda-toolkit-13-0 to match the CUDA 13.0 base image and prevent PyTorch compilation version mismatch. - Explicitly parse and update the standard /usr/local/cuda symlink after GKE package installation to resolve missing dev headers (cusparse.h) during compilation Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>

1. metadata cache implementation 2. metadata cache metrics instrumentation 3. unit tests Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>

saikat-royc · 2026-05-30T18:37:05Z

/cc @miroslavln request a first pass review for the changes.

note: commit in this PR will be removed, once #620 is submitted

saikat-royc · 2026-06-01T21:06:12Z

/cc @kfirtoledo request a review for this PR

saikat-royc · 2026-06-15T20:29:43Z

Closing this PR, since there is PR metadata cache for multi tier offloading connector vllm-project/vllm#44193

saikat-royc requested review from dannyharnik, kfirtoledo, liu-cong and vMaroon as code owners May 29, 2026 22:08

github-actions Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 29, 2026

github-actions Bot requested review from hyeongyun0916, sagearc and yankay May 29, 2026 22:08

saikat-royc force-pushed the fs-conn-client-05-28 branch from 8edee4e to b156c39 Compare May 29, 2026 22:10

saikat-royc changed the title ~~[WIP] llmd fsconnector metadata cache~~ [WIP DO NOT REVIEW] llmd fsconnector metadata cache May 29, 2026

saikat-royc force-pushed the fs-conn-client-05-28 branch 2 times, most recently from a2ce2c6 to 124f262 Compare May 30, 2026 06:06

fsconnector metadata cache

6577720

1. metadata cache implementation 2. metadata cache metrics instrumentation 3. unit tests Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>

saikat-royc force-pushed the fs-conn-client-05-28 branch from 124f262 to 6577720 Compare May 30, 2026 18:35

saikat-royc changed the title ~~[WIP DO NOT REVIEW] llmd fsconnector metadata cache~~ llmd fsconnector metadata cache May 30, 2026

saikat-royc closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llmd fsconnector metadata cache#621

llmd fsconnector metadata cache#621
saikat-royc wants to merge 2 commits into
llm-d:mainfrom
saikat-royc:fs-conn-client-05-28

saikat-royc commented May 29, 2026 •

edited

Loading

Uh oh!

saikat-royc commented May 30, 2026

Uh oh!

saikat-royc commented Jun 1, 2026

Uh oh!

saikat-royc commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saikat-royc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Overview

2. Key Design and Features

A. Tiered positive Caching & Filesystem Fallback ( metadata_cache.py , manager.py )

B. Bounded Hard TTL Expiration Strategy

C. Prometheus Metrics Instrumentation ( metrics.py , manager.py , metadata_cache.py )

Verification:

Uh oh!

saikat-royc commented May 30, 2026

Uh oh!

saikat-royc commented Jun 1, 2026

Uh oh!

saikat-royc commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

saikat-royc commented May 29, 2026 •

edited

Loading