Skip to content

Commit 6b17b68

Browse files
erneestocclaudeMarcusSorealheis
authored
macOS: skip set_readwrite_recursive walk after clonefile (~1.85x materialization speedup) (#2349)
* fs_util: skip set_readwrite_recursive walk after clonefile The macOS clonefile fast path was followed by a recursive chmod walk that made every file in the cloned tree writable (0o644 / 0o755). On real Bazel input shapes (~2000-file SwiftCompile) that walk accounted for ~46% of materialization time — ~33 µs per file, ~67 ms per action. Replace the walk with a single chmod(2) on the destination root. Existing entries inherit the source's read-only mode (0o555 dirs, 0o444 files). The worker can still create the action's declared output files inside the root because the root itself is 0o755. This matches the hermeticity contract enforced by Bazel's local sandbox (linux-sandbox bind-mounts inputs read-only; darwin-sandbox / sandbox-exec denies writes outside declared output paths) and the REAPI Action.output_files / output_directories semantics: actions write only to declared outputs, never mutate inputs. An action that does try to mutate an input now hits EACCES, which is the correct REAPI behavior — same failure mode as on Bazel's own sandbox. Bench (nativelink-util/benches/chmod_strategy.rs on the bench branch), toplevel_only vs full walk: shape walk toplevel_only speedup small_flat (64 files) 4.66 ms 2.61 ms 1.79x pcm_cluster (219 files) 15.17 ms 8.19 ms 1.85x medium_flat (635 files) 46.36 ms 25.10 ms 1.85x large_flat (1978 files) 147.39 ms 80.17 ms 1.84x set_readwrite_recursive stays public — directory_cache.rs:451 still uses it on the source side during eviction. Tests: - test_clonefile_root_writable_inputs_readonly: root 0o755, subdirs 0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable which assumed subdirs would be made writable). - test_clonefile_root_accepts_new_files: worker can create outputs at the root even though everything inside the clone is read-only. - test_clonefile_input_mutation_fails: writes to existing input files fail with PermissionDenied — encodes the hermeticity contract. * fs_util: remove dead set_readwrite_recursive after post-#2347 cleanup After #2347 (DirectoryCache cleanup uses set_dir_writable_recursive) and this PR (post-clonefile path uses chmod_dir_writable), the generic set_readwrite_recursive helper has zero remaining callers. Both replacements are intentionally narrower - set_dir_writable_recursive chmods only directories so file inodes aren't mutated, and chmod_dir_writable chmods only the destination root so the clone tree stays read-only inside. Delete the old helper and its private companion set_readwrite_one_path. Addresses palfrey's review feedback on #2347 ("set_readwrite_recursive and set_readwrite_one_path can be dropped as part of this") - sequenced on this PR instead because both PRs had to land before the helpers became dead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * worker: make the APFS clonefile directory-cache path work end-to-end PR #2338's clonefile fast path is effectively dead in production — it materialized 1 of 1771 actions in a real build. Three coupled bugs kept the directory cache silently falling back to the slow download path, and two of them only surface once the fast path actually fires, so they must land together. Bug 1: prepare_action_inputs received a work directory the caller had already created, but hardlink_directory_tree and clonefile(2) both require the destination to NOT exist. Every cache attempt failed its precondition and fell back to download_to_directory. Fix: remove the empty pre-created directory before invoking the cache, and recreate it on cache failure so the download fallback (which needs an existing destination) still works. Adds fs::remove_dir for the empty-dir removal. Bug 2: set_readonly_recursive chmod'd files to 0o444, stripping the execute bit from cached executables. Once a tree is cloned into a workspace this makes an action's interpreter/wrapper script fail with EACCES. Fix: mark files 0o555 instead of 0o444 — read + execute, still no write bit, so the hermeticity contract is unchanged. Bug 3: the clonefile path chmods only the destination root writable; cloned subdirectories keep the source's 0o555 mode. Bazel actions declare outputs at paths nested inside input subdirectories, and creating those files needs write permission on the parent directory. Fix: after a cache hit, set_dir_writable_recursive makes every directory in the materialized tree writable. Files stay read-only — they may be CAS-hardlinked and chmoding them would corrupt the shared inode. Adds regression tests for nested output creation, which the existing root-only clonefile tests did not cover. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
1 parent 41c7789 commit 6b17b68

4 files changed

Lines changed: 234 additions & 78 deletions

File tree

nativelink-util/src/fs.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,13 @@ pub async fn remove_file(path: impl AsRef<Path>) -> Result<(), Error> {
369369
call_with_permit(move |_| std::fs::remove_file(path).map_err(Into::<Error>::into)).await
370370
}
371371

372+
/// Removes an empty directory. Errors if the directory is not empty; use
373+
/// [`remove_dir_all`] when the contents should be removed too.
374+
pub async fn remove_dir(path: impl AsRef<Path>) -> Result<(), Error> {
375+
let path = path.as_ref().to_owned();
376+
call_with_permit(move |_| std::fs::remove_dir(path).map_err(Into::<Error>::into)).await
377+
}
378+
372379
pub async fn canonicalize(path: impl AsRef<Path>) -> Result<PathBuf, Error> {
373380
let path = path.as_ref().to_owned();
374381
call_with_permit(move |_| std::fs::canonicalize(path).map_err(Into::<Error>::into)).await

nativelink-util/src/fs_util.rs

Lines changed: 188 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,14 @@ pub enum CloneMethod {
5151
/// # Platform Support
5252
/// - macOS: Tries APFS `clonefile(2)` first (O(1), copy-on-write). On failure
5353
/// (e.g., cross-volume EXDEV, or any unexpected errno) falls back to per-file
54-
/// `fs::hard_link`. After a successful clone, the destination tree is made
55-
/// writable (0o755 / 0o644) because the clone inherits the source's
56-
/// permissions, and cached subtrees are 0o555 / 0o444. The COW semantics of
57-
/// `clonefile(2)` mean writes to the destination do not affect the source.
54+
/// `fs::hard_link`. After a successful clone, only the destination root is
55+
/// chmod'd to 0o755 so the worker can create the action's declared output
56+
/// files inside it. Existing entries inherit the source's read-only mode
57+
/// (0o555) — this matches the hermeticity contract
58+
/// enforced by Bazel's local sandbox and the REAPI `Action.output_files`
59+
/// semantics: actions can only write to declared outputs, never mutate
60+
/// inputs. The COW semantics of `clonefile(2)` mean any writes the worker
61+
/// does make to the destination do not affect the source.
5862
/// - Linux: Per-file `fs::hard_link` (directory hardlinks are not supported on
5963
/// ext4/btrfs without root). Always returns `CloneMethod::Hardlink`.
6064
/// - Windows: Per-file `fs::hard_link` (requires NTFS). Always returns
@@ -97,13 +101,17 @@ pub async fn hardlink_directory_tree(src_dir: &Path, dst_dir: &Path) -> Result<C
97101

98102
match try_clonefile(src_dir, dst_dir).await {
99103
Ok(()) => {
100-
// The clone inherits the source's permissions. Cached subtrees
101-
// are 0o555 / 0o444, but actions need to write outputs into
102-
// their input tree, so make the clone writable. COW means
103-
// these writes do not affect the source.
104-
set_readwrite_recursive(dst_dir)
104+
// Only chmod the destination root so the worker can create
105+
// the action's declared output files inside it. Existing
106+
// entries (subdirs and files) inherit the source's
107+
// read-only mode (0o555) — that's the hermeticity
108+
// contract. Skipping the per-file chmod walk avoids an
109+
// O(N) syscall sweep that, on real Bazel SwiftCompile
110+
// shapes (~2000 inputs), accounts for ~46% of
111+
// materialization time.
112+
chmod_dir_writable(dst_dir)
105113
.await
106-
.err_tip(|| "Failed to make cloned tree writable")?;
114+
.err_tip(|| "Failed to chmod cloned tree root")?;
107115
return Ok(CloneMethod::Clonefile);
108116
}
109117
Err(e) => {
@@ -178,6 +186,21 @@ async fn try_clonefile(src: &Path, dst: &Path) -> std::io::Result<()> {
178186
.map_err(std::io::Error::other)?
179187
}
180188

189+
/// Sets the directory `dir`'s mode to 0o755 so callers can create new
190+
/// entries inside it. Used after `clonefile(2)` on the materialized
191+
/// destination root: the clone inherits the source's read-only mode
192+
/// (0o555) but the worker needs to drop the action's declared output
193+
/// files into the root. Existing entries inside `dir` are intentionally
194+
/// left at their cloned read-only perms — that's the hermeticity
195+
/// contract.
196+
#[cfg(target_os = "macos")]
197+
async fn chmod_dir_writable(dir: &Path) -> Result<(), Error> {
198+
use std::os::unix::fs::PermissionsExt;
199+
fs::set_permissions(dir, std::fs::Permissions::from_mode(0o755))
200+
.await
201+
.err_tip(|| format!("Failed to chmod {} to 0o755", dir.display()))
202+
}
203+
181204
/// Internal recursive function to hardlink directory contents
182205
fn hardlink_directory_tree_recursive<'a>(
183206
src: &'a Path,
@@ -271,21 +294,6 @@ pub async fn set_readonly_recursive(dir: &Path) -> Result<(), Error> {
271294
set_perms_recursive_impl(dir.to_path_buf(), set_readonly_one_path).await
272295
}
273296

274-
/// Sets a directory tree to read-write for the current user recursively.
275-
/// This is done so we can delete directories we're evicting.
276-
///
277-
/// # Arguments
278-
/// * `dir` - Directory to make read-write
279-
///
280-
/// # Platform Notes
281-
/// - Unix: Sets permissions to 0o755 (rwxr-xr-x)
282-
/// - Windows: Clears `FILE_ATTRIBUTE_READONLY`
283-
pub async fn set_readwrite_recursive(dir: &Path) -> Result<(), Error> {
284-
error_if!(!dir.exists(), "Directory does not exist: {}", dir.display());
285-
286-
set_perms_recursive_impl(dir.to_path_buf(), set_readwrite_one_path).await
287-
}
288-
289297
/// Sets only the **directories** in a tree to writable for the current user,
290298
/// leaving files untouched. This is the safe variant for cleanup paths that
291299
/// need to delete a tree containing CAS-hardlinked files.
@@ -319,10 +327,14 @@ fn set_readonly_one_path(
319327
use std::os::unix::fs::PermissionsExt;
320328
let mut perms = metadata.permissions();
321329

322-
// If it's a directory, set to r-xr-xr-x (555)
323-
// If it's a file, set to r--r--r-- (444)
324-
let mode = if metadata.is_dir() { 0o555 } else { 0o444 };
325-
perms.set_mode(mode);
330+
// Both directories and files get r-xr-xr-x (0o555): read and
331+
// execute for everyone, write for no one. Files use 0o555 rather
332+
// than 0o444 so the execute bit survives on cached executables —
333+
// a stripped +x bit makes an action's interpreter or wrapper
334+
// script fail with EACCES once the tree is materialized into a
335+
// workspace. The write bit stays cleared, so the hermeticity
336+
// contract (inputs are immutable) is unchanged.
337+
perms.set_mode(0o555);
326338

327339
fs::set_permissions(&path, perms)
328340
.await
@@ -343,41 +355,6 @@ fn set_readonly_one_path(
343355
})
344356
}
345357

346-
fn set_readwrite_one_path(
347-
path: PathBuf,
348-
metadata: Metadata,
349-
) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send>> {
350-
Box::pin(async move {
351-
// Set the file/directory to read-write for the current user
352-
#[cfg(unix)]
353-
{
354-
use std::os::unix::fs::PermissionsExt;
355-
let mut perms = metadata.permissions();
356-
357-
// If it's a directory, set to rwxr-xr-x (755)
358-
// If it's a file, set to rw-r--r-- (644)
359-
let mode = if metadata.is_dir() { 0o755 } else { 0o644 };
360-
perms.set_mode(mode);
361-
362-
fs::set_permissions(&path, perms)
363-
.await
364-
.err_tip(|| format!("Failed to set permissions for: {}", path.display()))?;
365-
}
366-
367-
#[cfg(windows)]
368-
{
369-
let mut perms = metadata.permissions();
370-
perms.set_readonly(false);
371-
372-
fs::set_permissions(&path, perms)
373-
.await
374-
.err_tip(|| format!("Failed to set permissions for: {}", path.display()))?;
375-
}
376-
377-
Ok(())
378-
})
379-
}
380-
381358
fn set_dir_writable_one_path(
382359
path: PathBuf,
383360
metadata: Metadata,
@@ -590,39 +567,103 @@ mod tests {
590567

591568
#[cfg(target_os = "macos")]
592569
#[nativelink_test("crate")]
593-
async fn test_clonefile_dest_is_writable() -> Result<(), Error> {
570+
async fn test_clonefile_root_writable_inputs_readonly() -> Result<(), Error> {
594571
use std::os::unix::fs::PermissionsExt;
595572

596573
let (temp_dir, src_dir) = create_test_directory().await?;
597-
// Source mimics the directory cache: 0o555 dirs, 0o444 files.
574+
// Source mimics the directory cache: 0o555 dirs and files.
598575
set_readonly_recursive(&src_dir).await?;
599576

600577
let dst_dir = temp_dir.path().join("clone_dst");
601578
hardlink_directory_tree(&src_dir, &dst_dir).await?;
602579

603-
let src_subdir_mode = fs::metadata(src_dir.join("subdir"))
580+
// Root: writable, so the worker can drop the action's declared
581+
// outputs inside it.
582+
let root_mode = fs::metadata(&dst_dir).await?.permissions().mode() & 0o777;
583+
assert_eq!(root_mode, 0o755, "destination root must be writable");
584+
585+
// Subdir and existing file: stay read-only. Hermeticity contract —
586+
// inputs are not writable. Matches Bazel's local-sandbox model and
587+
// REAPI Action.output_files semantics: actions can only write to
588+
// declared outputs, not mutate inputs.
589+
let dst_subdir_mode = fs::metadata(dst_dir.join("subdir"))
604590
.await?
605591
.permissions()
606592
.mode()
607593
& 0o777;
608594
assert_eq!(
609-
src_subdir_mode, 0o555,
610-
"source dir should still be readonly after clone"
595+
dst_subdir_mode, 0o555,
596+
"cloned subdirs must inherit source read-only mode"
611597
);
612598

613-
let dst_subdir_mode = fs::metadata(dst_dir.join("subdir"))
599+
let dst_file_mode = fs::metadata(dst_dir.join("file1.txt"))
614600
.await?
615601
.permissions()
616602
.mode()
617603
& 0o777;
618604
assert_eq!(
619-
dst_subdir_mode, 0o755,
620-
"cloned dir should be writable so actions can write outputs"
605+
dst_file_mode, 0o555,
606+
"cloned files must inherit source read-only mode"
607+
);
608+
609+
// Source untouched.
610+
let src_subdir_mode = fs::metadata(src_dir.join("subdir"))
611+
.await?
612+
.permissions()
613+
.mode()
614+
& 0o777;
615+
assert_eq!(
616+
src_subdir_mode, 0o555,
617+
"source dir should still be readonly after clone"
621618
);
622619

623620
Ok(())
624621
}
625622

623+
#[cfg(target_os = "macos")]
624+
#[nativelink_test("crate")]
625+
async fn test_clonefile_root_accepts_new_files() -> Result<(), Error> {
626+
let (temp_dir, src_dir) = create_test_directory().await?;
627+
set_readonly_recursive(&src_dir).await?;
628+
629+
let dst_dir = temp_dir.path().join("clone_dst");
630+
hardlink_directory_tree(&src_dir, &dst_dir).await?;
631+
632+
// The worker creates declared output files at the action's
633+
// working directory root. Verify a new file can be created there
634+
// even though everything inside the clone is read-only (0o555).
635+
let new_output = dst_dir.join("new_output.bin");
636+
fs::write(&new_output, b"action output").await?;
637+
assert_eq!(fs::read(&new_output).await?, b"action output");
638+
639+
Ok(())
640+
}
641+
642+
#[cfg(target_os = "macos")]
643+
#[nativelink_test("crate")]
644+
async fn test_clonefile_input_mutation_fails() -> Result<(), Error> {
645+
let (temp_dir, src_dir) = create_test_directory().await?;
646+
set_readonly_recursive(&src_dir).await?;
647+
648+
let dst_dir = temp_dir.path().join("clone_dst");
649+
hardlink_directory_tree(&src_dir, &dst_dir).await?;
650+
651+
// Hermeticity: actions cannot mutate inputs. A write to an input
652+
// file in the cloned tree must fail with EACCES, mirroring what
653+
// Bazel's linux-sandbox / darwin-sandbox would do.
654+
let input_file = dst_dir.join("file1.txt");
655+
let err = fs::write(&input_file, b"mutated")
656+
.await
657+
.expect_err("input file write should fail (file is 0o555, no write bit)");
658+
assert_eq!(err.kind(), std::io::ErrorKind::PermissionDenied);
659+
660+
// Source must be untouched.
661+
let src_content = fs::read_to_string(src_dir.join("file1.txt")).await?;
662+
assert_eq!(src_content, "Hello, World!");
663+
664+
Ok(())
665+
}
666+
626667
#[cfg(target_os = "macos")]
627668
#[nativelink_test("crate")]
628669
async fn test_clonefile_cow_isolation() -> Result<(), Error> {
@@ -647,6 +688,50 @@ mod tests {
647688
Ok(())
648689
}
649690

691+
/// Regression test for the clonefile directory-cache path. Bazel actions
692+
/// declare outputs at paths nested inside input subdirectories, but
693+
/// `hardlink_directory_tree` chmods only the destination *root* writable
694+
/// — cloned subdirs keep the source's read-only mode (0o555). This test
695+
/// mirrors the two-step `prepare_action_inputs` performs after a cache
696+
/// hit (materialize, then `set_dir_writable_recursive`) and proves a
697+
/// nested output can be created only after the recursive walk.
698+
#[cfg(target_os = "macos")]
699+
#[nativelink_test("crate")]
700+
async fn test_clonefile_nested_output_after_dir_writable() -> Result<(), Error> {
701+
use std::os::unix::fs::PermissionsExt;
702+
703+
let (temp_dir, src_dir) = create_test_directory().await?;
704+
set_readonly_recursive(&src_dir).await?;
705+
706+
let dst_dir = temp_dir.path().join("clone_dst");
707+
hardlink_directory_tree(&src_dir, &dst_dir).await?;
708+
709+
// The clone leaves "subdir" at 0o555, so creating an output nested
710+
// inside it fails — this is the gap a root-only chmod ships.
711+
let nested_output = dst_dir.join("subdir").join("nested_output.o");
712+
let err = fs::write(&nested_output, b"x")
713+
.await
714+
.expect_err("nested write must fail while the subdir is 0o555");
715+
assert_eq!(err.kind(), std::io::ErrorKind::PermissionDenied);
716+
717+
// This is what prepare_action_inputs does after a cache hit: make
718+
// every directory in the materialized tree writable.
719+
set_dir_writable_recursive(&dst_dir).await?;
720+
721+
fs::write(&nested_output, b"action output").await?;
722+
assert_eq!(fs::read(&nested_output).await?, b"action output");
723+
724+
// Files inside the tree stay read-only — hermeticity holds.
725+
let file_mode = fs::metadata(dst_dir.join("subdir").join("file2.txt"))
726+
.await?
727+
.permissions()
728+
.mode()
729+
& 0o777;
730+
assert_eq!(file_mode, 0o555, "input files must remain read-only");
731+
732+
Ok(())
733+
}
734+
650735
#[nativelink_test("crate")]
651736
async fn test_set_readonly_recursive() -> Result<(), Error> {
652737
let (_temp_dir, test_dir) = create_test_directory().await?;
@@ -663,6 +748,37 @@ mod tests {
663748
Ok(())
664749
}
665750

751+
/// `set_dir_writable_recursive` must make *every* directory in a tree
752+
/// writable — including nested subdirs — so actions can create outputs
753+
/// at nested declared-output paths. Files are left read-only because
754+
/// they may share a CAS inode via hardlink.
755+
#[cfg(unix)]
756+
#[nativelink_test("crate")]
757+
async fn test_set_dir_writable_recursive_walks_nested_dirs() -> Result<(), Error> {
758+
use std::os::unix::fs::PermissionsExt;
759+
760+
let (_temp_dir, test_dir) = create_test_directory().await?;
761+
// Lock the tree down the way the directory cache does post-construction.
762+
set_readonly_recursive(&test_dir).await?;
763+
set_dir_writable_recursive(&test_dir).await?;
764+
765+
// Every directory — the root and the nested subdir — must be writable.
766+
for dir in [test_dir.clone(), test_dir.join("subdir")] {
767+
let mode = fs::metadata(&dir).await?.permissions().mode() & 0o777;
768+
assert_eq!(mode, 0o755, "{} must be writable", dir.display());
769+
}
770+
771+
// Files stay read-only — chmoding them would corrupt a shared CAS inode.
772+
let file_mode = fs::metadata(test_dir.join("subdir/file2.txt"))
773+
.await?
774+
.permissions()
775+
.mode()
776+
& 0o777;
777+
assert_eq!(file_mode, 0o555, "files must remain read-only");
778+
779+
Ok(())
780+
}
781+
666782
#[nativelink_test("crate")]
667783
async fn test_calculate_directory_size() -> Result<(), Error> {
668784
let (_temp_dir, test_dir) = create_test_directory().await?;

nativelink-worker/src/directory_cache.rs

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -451,8 +451,8 @@ impl DirectoryCache {
451451
// CRITICAL: only chmod directories writable, never files. Cached
452452
// files share an inode with FilesystemStore CAS entries via
453453
// hardlink (see `download_to_directory` in running_actions_manager).
454-
// Calling `set_readwrite_recursive` here would chmod those files
455-
// to 0o644, mutating the CAS inode's mode for every other in-flight
454+
// A naive recursive chmod here would mutate those files to 0o644,
455+
// changing the CAS inode's mode for every other in-flight
456456
// action that has hardlinked the same blob and causing EACCES on
457457
// exec (e.g. cc_wrapper.sh) or EPERM on open. Directory write
458458
// permission is sufficient on unix to unlink files inside.
@@ -733,8 +733,8 @@ mod tests {
733733
.unwrap();
734734

735735
// Mark the cached tree read-only the way DirectoryCache does after
736-
// construction (set_readonly_recursive). This drops the file to 0o444
737-
// and the directories to 0o555.
736+
// construction (set_readonly_recursive). This sets every file and
737+
// directory to 0o555 (read + execute, no write).
738738
set_readonly_recursive(&cache_entry_dir).await?;
739739

740740
// Simulate an in-flight action workspace that has hardlinked the
@@ -758,7 +758,7 @@ mod tests {
758758
.permissions()
759759
.mode()
760760
& 0o777;
761-
assert_eq!(workspace_mode_before, 0o444);
761+
assert_eq!(workspace_mode_before, 0o555);
762762

763763
// Run the cleanup that `evict_lru` runs before removing the tree.
764764
// After the fix this only chmods directories; before the fix this

0 commit comments

Comments
 (0)