feat: add --overlay flag for shared compilation caches#98
feat: add --overlay flag for shared compilation caches#98dacorvo wants to merge 12 commits intohuggingface:mainfrom
Conversation
295b220 to
c75cd11
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds an --overlay mount mode intended to enable “read-through, write-local” semantics for shared compilation caches: existing local files remain visible, remote bucket contents are readable on demand, and new writes persist locally without being uploaded back to the bucket.
Changes:
- Introduces an overlay-aware staging path strategy (original-path staging under an overlay root).
- Adds overlay configuration and local directory merging logic to the VFS (plus overlay-specific behaviors like skipping remote flush manager).
- Adds test helpers and a new suite of unit tests for overlay behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/setup.rs | Adds --overlay CLI flag and opens a pre-mount FD to access the shadowed mountpoint directory. |
| src/xet.rs | Extends StagingDir with optional overlay_root and overlay-aware staging path resolution. |
| src/virtual_fs/mod.rs | Adds overlay to VfsConfig, disables FlushManager in overlay, merges local overlay entries during readdir loading, and routes advanced-write staging paths via staging_path(). |
| src/test_mocks.rs | Extends test config for overlay and adds an overlay-specific VFS test builder. |
| src/virtual_fs/tests.rs | Adds overlay mode unit tests (readdir merge, local/remote reads, local override, write persistence, etc.). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Reuses the advanced-writes staging infrastructure but redirects staging paths from inode-based (ino_N_session) to the file's original path under the mount point directory. This provides overlayfs-like behavior: - Pre-existing local files are visible through the mount (merged view) - New writes persist in original path layout in the local dir - Writes survive unmount/remount cycles - Reads merge local files with remote bucket (local takes precedence) Implementation: - StagingDir gains overlay_path(full_path) and staging_path(ino, full_path) - Pre-mount fd kept via /proc/self/fd/N to access shadowed local dir - readdir merges remote listing with local dir entries - Startup scan registers pre-existing local files in inode table - FlushManager skipped in overlay mode (writes stay local) - fsync skips remote upload in overlay mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 tests covering: readdir merge, local file reads, local-overrides-remote, write path preservation, mkdir on disk, no flush manager, no fsync upload, nested directories, remote file reads, and write-then-reread. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c75cd11 to
74f1066
Compare
|
The CI integration tests fail because fork PRs don't have access to upstream secrets ( Also verified with manual integration tests (NFS + FUSE): overlay mount/unmount persistence, pre-existing local files visible through mount, writes at original paths, and local-overrides-remote. |
- Clear xet_hash when local file overrides remote (prevents stale Hub ops) - Create mount point dir before opening pre-mount fd - Use /dev/fd instead of /proc/self/fd (macOS compatible) - fsync syncs local fd for durability (only skips remote upload) - Use symlink_metadata + real unix permissions in overlay merge - Skip symlinks in overlay scan - Fix doc comment on make_overlay_test_vfs_with_root Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.
Comments suppressed due to low confidence (1)
src/virtual_fs/mod.rs:156
- Setting flush_manager to None in overlay mode is not sufficient to guarantee “no remote mutations”. Several code paths treat
flush_manager == Noneas “simple mode” and will still callhub_client.batch_operationsdirectly (e.g., unlink deletes remote immediately; rename sends AddFile/DeleteFile for clean remote files). Overlay mode should explicitly disable all remote write operations (rename/unlink/rmdir/etc.), regardless of flush_manager presence, and restrict changes to the local overlay only.
let flush_manager = if !config.read_only && config.advanced_writes && !config.overlay {
let sd = staging_dir
.as_ref()
.expect("--advanced-writes requires a staging directory");
Some(flush::FlushManager::new(
xet_sessions.clone(),
sd.clone(),
hub_client.clone(),
inodes.clone(),
&runtime,
config.flush_debounce,
config.flush_max_batch_window,
))
} else {
None
};
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use remote_read_only for token/upload config (overlay doesn't need write tokens, VFS stays read-write for local writes) - Platform-specific fd path: /proc/self/fd on Linux, /dev/fd elsewhere - Filter OS junk files in overlay merge (respects filter_os_files) - Propagate create_dir_all errors instead of ignoring them - Move overlay import to top of test file - Use attr.ino from create() return instead of hard-coded inode 2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9cc17e2 to
145be6c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add `overlay` field to VirtualFs for early gating of remote mutations: - unlink: skip remote delete entirely in overlay mode - rename: skip rename_remote() entirely in overlay mode - setattr: use staging_path() instead of path() for correct overlay paths - mkdir: propagate create_dir_all errors instead of ignoring them Simplify existing overlay checks to use self.overlay directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b1b6220 to
e774d06
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Make staging_path() a pure path computation (no I/O side effects); add ensure_staging_parents() called only from write/create/truncate paths - Propagate overlay rename errors (mkdir + fs::rename) instead of warn - Clean stale test overlay dirs via fresh_overlay_dir() helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add debug_assert validating overlay staging paths are safe relative paths - Fix doc comments on OverlayTestVfs and make_overlay_test_vfs_with_root - Handle symlink creation failure gracefully in overlay_skips_symlinks test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/virtual_fs/mod.rs:158
VirtualFs::new()assumes overlay impliesadvanced_writes(seestreaming_commitdebug_assert), but this invariant isn’t enforced. If a caller constructsVfsConfig { overlay: true, advanced_writes: false }(or passesstaging_dir: None), some write paths may fall back to streaming mode and either attempt remote upload or fail with confusing EIOs due to missing upload config. Consider asserting/forcingadvanced_writes(and requiringstaging_dirwithoverlay_root) wheneverconfig.overlayis true so overlay behavior is well-defined for all call sites.
pub fn new(
runtime: tokio::runtime::Handle,
hub_client: Arc<dyn HubOps>,
xet_sessions: Arc<dyn XetOps>,
staging_dir: Option<StagingDir>,
config: VfsConfig,
) -> Arc<Self> {
let inodes = Arc::new(RwLock::new(InodeTable::new()));
let negative_cache = Arc::new(RwLock::new(HashMap::new()));
let flush_manager = if !config.read_only && config.advanced_writes && !config.overlay {
let sd = staging_dir
.as_ref()
.expect("--advanced-writes requires a staging directory");
Some(flush::FlushManager::new(
xet_sessions.clone(),
sd.clone(),
hub_client.clone(),
inodes.clone(),
&runtime,
config.flush_debounce,
config.flush_max_batch_window,
))
} else {
None
};
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Broaden rename EPERM guard to all clean entries (not just xet_hash), covering remote directories and etag-based files - Mark overlay-created directories as dirty so they survive stale-child pruning on poll/reload - Return EPERM when overlay rename source is not materialized on disk instead of silently succeeding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Design decision: path traversal validation in overlay modeCopilot has flagged
Path traversal would require either a malicious Hub backend (in which case the entire mount is compromised, not just overlay) or a bug in The |
Match the rename guard: block unlink of any clean (non-dirty) entry in overlay mode, not just xet_hash-backed files. This covers etag-based and other remote representations that lack xet_hash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- rmdir removes on-disk overlay directory so it doesn't reappear on remount - Overlay merge handles type conflicts: if local has a directory where remote has a file (or vice versa), remove the old entry and reinsert so local kind wins cleanly in release builds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Update mode in overlay merge so local file permissions are reflected - Upgrade streaming_commit debug_assert to assert for fail-fast in prod - Reuse staging_path() in ensure_staging_parents() for consistent path validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- rmdir now returns EPERM for clean remote directories (matches unlink/rename) - Extract overlay_root() helper to reduce boilerplate across 4 call sites - Update overlay merge to also set mode from local file metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
cc @ErikKaum that is using mounted buckets for torch.compile cache on Inference Endpoints. |
Summary
Adds
--overlayflag that provides true overlay semantics: pre-existing local files are visible through the mount, new writes persist locally in their original path layout, and nothing is pushed to the remote bucket. This enables a provider / consumer model for shared compilation caches.The problem
Modern ML inference stacks compile models into hardware-specific artifacts before serving. This compilation is expensive — minutes to hours depending on the model — and the results are deterministic: the same model + compiler version + hardware always produces the same artifacts.
Today, every new machine recompiles from scratch. Teams work around this by manually copying cache directories between instances, setting up shared NFS mounts, or building custom S3 sync scripts. None of these integrate naturally with the Hub ecosystem.
The solution
hf-mount --overlayturns any HF Storage Bucket into a shared, read-through compilation cache with zero infrastructure:Cache providers (few — CI, dedicated build machines) mount read-write and compile. Artifacts go directly to the bucket:
hf-mount start bucket my-org/compilation-cache /path/to/cache # compile, artifacts land in bucket automaticallyCache consumers (many — dev machines, inference fleet, autoscaling pods) mount with
--overlay. Cached artifacts are served lazily from the bucket. Cache misses compile locally without polluting the shared bucket:hf-mount start --overlay bucket my-org/compilation-cache /path/to/cache # reads from bucket, local compilations stay localNo write access required for consumers. No coordination between machines.
The provider / consumer model
--overlayand starts warm for any model in the curated cache — artifacts are served lazily from the bucket on first access.Consider a new model launch day: without shared caches, every instance in an autoscaling fleet recompiles independently — 100 pods means 100 identical compilations, hours of wasted accelerator time. With overlay, the fleet goes from cold to serving in the time it takes to mount + load, not compile.
Who benefits
AWS Neuron (Trainium / Inferentia)
The Neuron compiler writes NEFF binaries to
/var/tmp/neuron-compile-cache. Compilation of a single LLM can take 30–60 minutes. The Neuron Persistent Cache documentation recommends sharing cache directories across instances, but the only built-in option is S3 — which requires AWS-specific configuration.torch.compile / TorchInductor
torch.compilecaches compiled kernels inTORCHINDUCTOR_CACHE_DIR. Cold compilation adds significant startup latency — Replicate reports that caching eliminates redundant compilation entirely. The PyTorch documentation recommends copying cache directories for deployment.hf-mount start --overlay bucket my-org/torch-compile-cache $TORCHINDUCTOR_CACHE_DIRvLLM
vLLM compiles during cold start and saves artifacts to
~/.cache/vllm/torch_compile_cache. The vLLM documentation recommends sharing this directory across instances. Tensorfuse reports a 70% reduction in cold start time (294s → 82s) from cache sharing.hf-mount start --overlay bucket my-org/vllm-cache ~/.cache/vllm/torch_compile_cacheJAX/XLA (TPU and GPU)
JAX's persistent compilation cache compiles HLO programs for TPUs/GPUs. The documentation explicitly states the cache "should be in a shared file system (e.g., NFS)" for distributed workloads.
Triton (OpenAI) kernel cache
Triton compiles GPU kernels to
~/.triton/cache, used by torch.compile and vLLM internally. Each new instance rebuilds the entire kernel cache.hf-mount start --overlay bucket my-org/triton-cache ~/.triton/cacheWhy overlay for consumers
Overlay semantics
Clean (remote) entries are immutable from the consumer's perspective:
Dirty (local/COW'd) entries have full read-write semantics. Deleting a local shadow reveals the remote original on remount — standard overlay behavior without whiteouts.
Kubernetes deployment via CSI driver
--overlayis fully compatible with the hf-csi-driver — no driver changes needed. Mount options are forwarded as CLI flags to hf-mount-fuse:Or for inline ephemeral volumes:
This is the natural deployment path for the autoscaling fleet scenario: every pod gets a warm compilation cache via a standard PVC, no per-pod compilation needed.
How it works
--overlayreuses hf-mount's existing advanced-write infrastructure, redirecting staging paths from opaque inode-based names to the file's original path under the mount point. A pre-mount file descriptor preserves access to the underlying local directory after the mount shadows it.Test plan
🤖 Generated with Claude Code