Skip to content

feat: add --overlay flag for shared compilation caches#98

Open
dacorvo wants to merge 12 commits intohuggingface:mainfrom
dacorvo:feat/overlay-mode
Open

feat: add --overlay flag for shared compilation caches#98
dacorvo wants to merge 12 commits intohuggingface:mainfrom
dacorvo:feat/overlay-mode

Conversation

@dacorvo
Copy link
Copy Markdown

@dacorvo dacorvo commented Apr 3, 2026

Summary

Adds --overlay flag that provides true overlay semantics: pre-existing local files are visible through the mount, new writes persist locally in their original path layout, and nothing is pushed to the remote bucket. This enables a provider / consumer model for shared compilation caches.

The problem

Modern ML inference stacks compile models into hardware-specific artifacts before serving. This compilation is expensive — minutes to hours depending on the model — and the results are deterministic: the same model + compiler version + hardware always produces the same artifacts.

Today, every new machine recompiles from scratch. Teams work around this by manually copying cache directories between instances, setting up shared NFS mounts, or building custom S3 sync scripts. None of these integrate naturally with the Hub ecosystem.

The solution

hf-mount --overlay turns any HF Storage Bucket into a shared, read-through compilation cache with zero infrastructure:

Cache providers (few — CI, dedicated build machines) mount read-write and compile. Artifacts go directly to the bucket:

hf-mount start bucket my-org/compilation-cache /path/to/cache
# compile, artifacts land in bucket automatically

Cache consumers (many — dev machines, inference fleet, autoscaling pods) mount with --overlay. Cached artifacts are served lazily from the bucket. Cache misses compile locally without polluting the shared bucket:

hf-mount start --overlay bucket my-org/compilation-cache /path/to/cache
# reads from bucket, local compilations stay local

No write access required for consumers. No coordination between machines.

The provider / consumer model

  1. A cache provider compiles the most popular models on the Hub — the top models people actually deploy — and the artifacts land in the bucket. This is a curated list, not every possible variant.
  2. Every consumer mounts with --overlay and starts warm for any model in the curated cache — artifacts are served lazily from the bucket on first access.
  3. Users deploying a less common variant or a custom configuration compile locally. Their artifacts stay in the local overlay — useful for their own subsequent runs, but they don't pollute the shared cache with niche configurations.

Consider a new model launch day: without shared caches, every instance in an autoscaling fleet recompiles independently — 100 pods means 100 identical compilations, hours of wasted accelerator time. With overlay, the fleet goes from cold to serving in the time it takes to mount + load, not compile.

Who benefits

AWS Neuron (Trainium / Inferentia)

The Neuron compiler writes NEFF binaries to /var/tmp/neuron-compile-cache. Compilation of a single LLM can take 30–60 minutes. The Neuron Persistent Cache documentation recommends sharing cache directories across instances, but the only built-in option is S3 — which requires AWS-specific configuration.

hf-mount start --overlay bucket aws-neuron/neff-cache /var/tmp/neuron-compile-cache

Note: The Neuron compiler hard-fails if its cache directory is read-only — a plain read-only mount is not an option. Overlay is required so cache misses can compile to the local directory.

torch.compile / TorchInductor

torch.compile caches compiled kernels in TORCHINDUCTOR_CACHE_DIR. Cold compilation adds significant startup latency — Replicate reports that caching eliminates redundant compilation entirely. The PyTorch documentation recommends copying cache directories for deployment.

hf-mount start --overlay bucket my-org/torch-compile-cache $TORCHINDUCTOR_CACHE_DIR

Note: torch.compile hard-fails with PermissionError if TORCHINDUCTOR_CACHE_DIR points to a read-only directory — there is no graceful fallback. Verified with PyTorch 2.11 and Llama-3.2-1B: a read-only cache dir causes an InductorError on the first forward pass. A plain read-only mount is not an option — overlay is required so cache misses can compile to the local directory.

vLLM

vLLM compiles during cold start and saves artifacts to ~/.cache/vllm/torch_compile_cache. The vLLM documentation recommends sharing this directory across instances. Tensorfuse reports a 70% reduction in cold start time (294s → 82s) from cache sharing.

hf-mount start --overlay bucket my-org/vllm-cache ~/.cache/vllm/torch_compile_cache

JAX/XLA (TPU and GPU)

JAX's persistent compilation cache compiles HLO programs for TPUs/GPUs. The documentation explicitly states the cache "should be in a shared file system (e.g., NFS)" for distributed workloads.

hf-mount start --overlay bucket my-org/xla-cache /path/to/jax_cache

Triton (OpenAI) kernel cache

Triton compiles GPU kernels to ~/.triton/cache, used by torch.compile and vLLM internally. Each new instance rebuilds the entire kernel cache.

hf-mount start --overlay bucket my-org/triton-cache ~/.triton/cache

Why overlay for consumers

Read-write mount (provider) Overlay mount (consumer)
Write access Required Not needed
Writes go to Bucket (shared) Local dir (private)
Cache misses Compile + push to bucket Compile + keep locally
Concurrent use Needs coordination Fully independent
Bucket growth Controlled by provider No consumer pollution

Overlay semantics

Clean (remote) entries are immutable from the consumer's perspective:

Operation on clean remote entry Result
read / readdir Served from remote (lazy fetch)
open(writable, O_TRUNC) Creates local shadow (override)
open(writable, no truncate) COW: download + local write
unlink EPERM
rename EPERM
rmdir EPERM

Dirty (local/COW'd) entries have full read-write semantics. Deleting a local shadow reveals the remote original on remount — standard overlay behavior without whiteouts.

Kubernetes deployment via CSI driver

--overlay is fully compatible with the hf-csi-driver — no driver changes needed. Mount options are forwarded as CLI flags to hf-mount-fuse:

# PersistentVolume with overlay mode
apiVersion: v1
kind: PersistentVolume
spec:
  mountOptions:
    - overlay
  csi:
    volumeAttributes:
      sourceType: bucket
      sourceId: my-org/compilation-cache

Or for inline ephemeral volumes:

volumeAttributes:
  sourceType: bucket
  sourceId: my-org/compilation-cache
  mountFlags: "overlay"

This is the natural deployment path for the autoscaling fleet scenario: every pod gets a warm compilation cache via a standard PVC, no per-pod compilation needed.

How it works

--overlay reuses hf-mount's existing advanced-write infrastructure, redirecting staging paths from opaque inode-based names to the file's original path under the mount point. A pre-mount file descriptor preserves access to the underlying local directory after the mount shadows it.

  • Reads: check local dir first (via pre-mount fd), then remote bucket (lazy fetch)
  • Writes: go to the local dir at original paths (via advanced-write staging)
  • Readdir: merges remote listing with local directory entries (local wins on conflict, including type conflicts)
  • Persistence: local files survive unmount/remount (real paths, not ephemeral staging)
  • No FlushManager: writes are never uploaded to the bucket
  • No remote mutations: unlink/rename/rmdir of clean entries return EPERM; no whiteout support
  • No write token: overlay consumers use read-only credentials

Test plan

  • Integration tests: mount/write/unmount/remount persistence (NFS + FUSE)
  • Integration tests: pre-existing local files visible through overlay mount
  • Integration tests: writes land at original paths (not inode-based staging)
  • 23 overlay unit tests covering: readdir merge, local reads, remote reads, local-overrides-remote, write path preservation, mkdir/rmdir persistence, no flush, no fsync upload, nested dirs, write-then-reread, EPERM guards (unlink/rename/rmdir for clean entries), type-conflict resolution, OS junk filtering, symlink skipping, setattr truncation path, local shadow xet_hash clearing
  • All existing unit tests pass (overlay defaults to false)
  • 239 total unit tests

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 3, 2026 09:58
@dacorvo dacorvo marked this pull request as draft April 3, 2026 10:02
@dacorvo dacorvo force-pushed the feat/overlay-mode branch from 295b220 to c75cd11 Compare April 3, 2026 10:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an --overlay mount mode intended to enable “read-through, write-local” semantics for shared compilation caches: existing local files remain visible, remote bucket contents are readable on demand, and new writes persist locally without being uploaded back to the bucket.

Changes:

  • Introduces an overlay-aware staging path strategy (original-path staging under an overlay root).
  • Adds overlay configuration and local directory merging logic to the VFS (plus overlay-specific behaviors like skipping remote flush manager).
  • Adds test helpers and a new suite of unit tests for overlay behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/setup.rs Adds --overlay CLI flag and opens a pre-mount FD to access the shadowed mountpoint directory.
src/xet.rs Extends StagingDir with optional overlay_root and overlay-aware staging path resolution.
src/virtual_fs/mod.rs Adds overlay to VfsConfig, disables FlushManager in overlay, merges local overlay entries during readdir loading, and routes advanced-write staging paths via staging_path().
src/test_mocks.rs Extends test config for overlay and adds an overlay-specific VFS test builder.
src/virtual_fs/tests.rs Adds overlay mode unit tests (readdir merge, local/remote reads, local override, write persistence, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dacorvo and others added 2 commits April 3, 2026 10:24
Reuses the advanced-writes staging infrastructure but redirects staging
paths from inode-based (ino_N_session) to the file's original path under
the mount point directory. This provides overlayfs-like behavior:

- Pre-existing local files are visible through the mount (merged view)
- New writes persist in original path layout in the local dir
- Writes survive unmount/remount cycles
- Reads merge local files with remote bucket (local takes precedence)

Implementation:
- StagingDir gains overlay_path(full_path) and staging_path(ino, full_path)
- Pre-mount fd kept via /proc/self/fd/N to access shadowed local dir
- readdir merges remote listing with local dir entries
- Startup scan registers pre-existing local files in inode table
- FlushManager skipped in overlay mode (writes stay local)
- fsync skips remote upload in overlay mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 tests covering: readdir merge, local file reads, local-overrides-remote,
write path preservation, mkdir on disk, no flush manager, no fsync upload,
nested directories, remote file reads, and write-then-reread.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dacorvo dacorvo force-pushed the feat/overlay-mode branch from c75cd11 to 74f1066 Compare April 3, 2026 10:27
@dacorvo
Copy link
Copy Markdown
Author

dacorvo commented Apr 3, 2026

The CI integration tests fail because fork PRs don't have access to upstream secrets (HF_TOKEN). I ran them locally and they all pass:

# Unit tests (226 passed)
cargo test --lib

# Bucket integration (FUSE + NFS)
cargo test --release --test fuse_ops --test nfs_ops -- --test-threads=1 --nocapture

# Repo integration
cargo test --release --test repo_ops -- --test-threads=1 --nocapture

Also verified with manual integration tests (NFS + FUSE): overlay mount/unmount persistence, pre-existing local files visible through mount, writes at original paths, and local-overrides-remote.

- Clear xet_hash when local file overrides remote (prevents stale Hub ops)
- Create mount point dir before opening pre-mount fd
- Use /dev/fd instead of /proc/self/fd (macOS compatible)
- fsync syncs local fd for durability (only skips remote upload)
- Use symlink_metadata + real unix permissions in overlay merge
- Skip symlinks in overlay scan
- Fix doc comment on make_overlay_test_vfs_with_root

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

src/virtual_fs/mod.rs:156

  • Setting flush_manager to None in overlay mode is not sufficient to guarantee “no remote mutations”. Several code paths treat flush_manager == None as “simple mode” and will still call hub_client.batch_operations directly (e.g., unlink deletes remote immediately; rename sends AddFile/DeleteFile for clean remote files). Overlay mode should explicitly disable all remote write operations (rename/unlink/rmdir/etc.), regardless of flush_manager presence, and restrict changes to the local overlay only.
        let flush_manager = if !config.read_only && config.advanced_writes && !config.overlay {
            let sd = staging_dir
                .as_ref()
                .expect("--advanced-writes requires a staging directory");
            Some(flush::FlushManager::new(
                xet_sessions.clone(),
                sd.clone(),
                hub_client.clone(),
                inodes.clone(),
                &runtime,
                config.flush_debounce,
                config.flush_max_batch_window,
            ))
        } else {
            None
        };

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use remote_read_only for token/upload config (overlay doesn't need
  write tokens, VFS stays read-write for local writes)
- Platform-specific fd path: /proc/self/fd on Linux, /dev/fd elsewhere
- Filter OS junk files in overlay merge (respects filter_os_files)
- Propagate create_dir_all errors instead of ignoring them
- Move overlay import to top of test file
- Use attr.ino from create() return instead of hard-coded inode 2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add `overlay` field to VirtualFs for early gating of remote mutations:
- unlink: skip remote delete entirely in overlay mode
- rename: skip rename_remote() entirely in overlay mode
- setattr: use staging_path() instead of path() for correct overlay paths
- mkdir: propagate create_dir_all errors instead of ignoring them
Simplify existing overlay checks to use self.overlay directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Make staging_path() a pure path computation (no I/O side effects);
  add ensure_staging_parents() called only from write/create/truncate paths
- Propagate overlay rename errors (mkdir + fs::rename) instead of warn
- Clean stale test overlay dirs via fresh_overlay_dir() helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add debug_assert validating overlay staging paths are safe relative paths
- Fix doc comments on OverlayTestVfs and make_overlay_test_vfs_with_root
- Handle symlink creation failure gracefully in overlay_skips_symlinks test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/virtual_fs/mod.rs:158

  • VirtualFs::new() assumes overlay implies advanced_writes (see streaming_commit debug_assert), but this invariant isn’t enforced. If a caller constructs VfsConfig { overlay: true, advanced_writes: false } (or passes staging_dir: None), some write paths may fall back to streaming mode and either attempt remote upload or fail with confusing EIOs due to missing upload config. Consider asserting/forcing advanced_writes (and requiring staging_dir with overlay_root) whenever config.overlay is true so overlay behavior is well-defined for all call sites.
    pub fn new(
        runtime: tokio::runtime::Handle,
        hub_client: Arc<dyn HubOps>,
        xet_sessions: Arc<dyn XetOps>,
        staging_dir: Option<StagingDir>,
        config: VfsConfig,
    ) -> Arc<Self> {
        let inodes = Arc::new(RwLock::new(InodeTable::new()));
        let negative_cache = Arc::new(RwLock::new(HashMap::new()));

        let flush_manager = if !config.read_only && config.advanced_writes && !config.overlay {
            let sd = staging_dir
                .as_ref()
                .expect("--advanced-writes requires a staging directory");
            Some(flush::FlushManager::new(
                xet_sessions.clone(),
                sd.clone(),
                hub_client.clone(),
                inodes.clone(),
                &runtime,
                config.flush_debounce,
                config.flush_max_batch_window,
            ))
        } else {
            None
        };

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Broaden rename EPERM guard to all clean entries (not just xet_hash),
  covering remote directories and etag-based files
- Mark overlay-created directories as dirty so they survive stale-child
  pruning on poll/reload
- Return EPERM when overlay rename source is not materialized on disk
  instead of silently succeeding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dacorvo
Copy link
Copy Markdown
Author

dacorvo commented Apr 3, 2026

Design decision: path traversal validation in overlay mode

Copilot has flagged staging_path() and overlay_root.join() across multiple rounds for potential path traversal (absolute paths, .. components escaping the overlay root). We're keeping the current debug_assert! approach and not adding runtime validation. Here's why:

full_path is never derived from user input or untrusted external data. It is always constructed internally by the VFS from one of three sources:

  1. Remote entries: paths come from the Hub API listing. The Hub validates paths server-side — you cannot upload a file with .. or absolute path components to a bucket.
  2. Local overlay entries: read_dir() returns file_name() which is a single component (no slashes, no ..). full_path is then built by inode::child_path(parent_path, name).
  3. FUSE/NFS-created entries: the kernel passes a single name component to create()/mkdir() — never a full path.

Path traversal would require either a malicious Hub backend (in which case the entire mount is compromised, not just overlay) or a bug in child_path() (which would affect the entire VFS, not just overlay).

The debug_assert! catches programming errors during development. A runtime check would be defensive overhead against an input vector that doesn't exist in practice.

dacorvo and others added 2 commits April 3, 2026 18:57
Match the rename guard: block unlink of any clean (non-dirty) entry in
overlay mode, not just xet_hash-backed files. This covers etag-based
and other remote representations that lack xet_hash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- rmdir removes on-disk overlay directory so it doesn't reappear on remount
- Overlay merge handles type conflicts: if local has a directory where
  remote has a file (or vice versa), remove the old entry and reinsert
  so local kind wins cleanly in release builds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Update mode in overlay merge so local file permissions are reflected
- Upgrade streaming_commit debug_assert to assert for fail-fast in prod
- Reuse staging_path() in ensure_staging_parents() for consistent
  path validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- rmdir now returns EPERM for clean remote directories (matches unlink/rename)
- Extract overlay_root() helper to reduce boilerplate across 4 call sites
- Update overlay merge to also set mode from local file metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dacorvo
Copy link
Copy Markdown
Author

dacorvo commented Apr 8, 2026

cc @ErikKaum that is using mounted buckets for torch.compile cache on Inference Endpoints.

@dacorvo dacorvo marked this pull request as ready for review April 8, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants