Skip to content

llmd fsconnector metadata cache#621

Closed
saikat-royc wants to merge 2 commits into
llm-d:mainfrom
saikat-royc:fs-conn-client-05-28
Closed

llmd fsconnector metadata cache#621
saikat-royc wants to merge 2 commits into
llm-d:mainfrom
saikat-royc:fs-conn-client-05-28

Conversation

@saikat-royc

@saikat-royc saikat-royc commented May 29, 2026

Copy link
Copy Markdown
Contributor

1. Overview

This PR implements a client-side metadata cache for llmd_fs_backend to replace direct filesystem calls ( os.path.exists ) during the scheduler's lookup phase. Such a cache can help reduce the lookup latency stemming from the underlying storage system at scale. To address the eventual consistency challenges introduced by the asynchronous external eviction of files (e.g pvc_evictor), the cache implements a hard Time-To-Live (TTL) positive expiration policy.

2. Key Design and Features

A. Tiered positive Caching & Filesystem Fallback ( metadata_cache.py , manager.py )

• Metadata Cache Layer: Introduces an in-memory MetadataCache utilizing an OrderedDict structure to index block keys confirmed to exist on the filesystem.
• Fallback Resolution Workflow:
• Lookups check the cache first (Tier 1 hit).
• On a cache miss, the manager falls back to physical verification checks ( os.path.exists ). If confirmed, it back-fills the positive cache to accelerate subsequent queries.
• Stateless I/O and Safety: Writes (prepare/complete store actions) insert records back into the cache. Load paths bypass the cache entirely and read directly from physical layout paths, preventing stale lookup anomalies from corrupting active reads.

B. Bounded Hard TTL Expiration Strategy

• Eventual Consistency Window: Stored positive records save a monotonic timestamp ( time.monotonic() ). In contains and batch_contains queries, entries exceeding the configured lifespan are automatically popped, returning a Cache Miss (resulting in a fallback to filesystem tier)
• Hard TTL Boundaries: To prevent long-lived hot keys from extending their expiration window indefinitely (potentially hiding files deleted by an external evictor), subsequent insertion updates on pre-existing keys preserve their original timestamp instead of resetting or extending their lifetimes.
• Infinite TTL ( -1 ): Allows configuring metadata_cache_ttl_secs = -1 to disable time-bound expiration entirely, keeping positive keys cached forever (subject to standard LRU size boundaries) for use when the background evictor is completely disabled. This is suitable for a setup where we do not have any external eviction

C. Prometheus Metrics Instrumentation ( metrics.py , manager.py , metadata_cache.py )

• vllm_llmd_fs_metadata_cache_lookup_duration_seconds (Histogram): Tracks single-key metadata lookup latency metrics.
• vllm_llmd_fs_metadata_cache_lookup_blocks (Counter labeled result=["mem_hit", "fs_hit", "fs_miss"] ): Categorizes manager lookup step outcomes.
• vllm_llmd_fs_metadata_cache_entries (Gauge): Monitors positive cache in-memory capacity.
• vllm_llmd_fs_metadata_cache_evictions (Counter labeled type=["lru", "ttl"] ): Differentiates capacity LRU pop evictions from time-bound dynamic TTL prunes.

Verification:

  1. Inference perf benchmarks with the mtadata cache enabled
  2. unit tests

fix CUDA version mismatch and dev headers symlink
- Update default CUDA_TOOLKIT_PKG to cuda-toolkit-13-0 to
  match the CUDA 13.0 base image and prevent PyTorch compilation
  version mismatch.
- Explicitly parse and update the standard /usr/local/cuda symlink
  after GKE package installation to resolve missing dev headers
  (cusparse.h) during compilation

Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>
@github-actions github-actions Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 29, 2026
@saikat-royc saikat-royc force-pushed the fs-conn-client-05-28 branch from 8edee4e to b156c39 Compare May 29, 2026 22:10
@saikat-royc saikat-royc changed the title [WIP] llmd fsconnector metadata cache [WIP DO NOT REVIEW] llmd fsconnector metadata cache May 29, 2026
@saikat-royc saikat-royc force-pushed the fs-conn-client-05-28 branch 2 times, most recently from a2ce2c6 to 124f262 Compare May 30, 2026 06:06
1. metadata cache implementation
2. metadata cache metrics instrumentation
3. unit tests

Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>
@saikat-royc saikat-royc force-pushed the fs-conn-client-05-28 branch from 124f262 to 6577720 Compare May 30, 2026 18:35
@saikat-royc saikat-royc changed the title [WIP DO NOT REVIEW] llmd fsconnector metadata cache llmd fsconnector metadata cache May 30, 2026
@saikat-royc

Copy link
Copy Markdown
Contributor Author

/cc @miroslavln request a first pass review for the changes.

note: commit in this PR will be removed, once #620 is submitted

@saikat-royc

Copy link
Copy Markdown
Contributor Author

/cc @kfirtoledo request a review for this PR

@saikat-royc

Copy link
Copy Markdown
Contributor Author

Closing this PR, since there is PR metadata cache for multi tier offloading connector vllm-project/vllm#44193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant