Skip to content

worker: fix two silent output-corruption bugs (absolute symlinks + zero-digest files)#2346

Merged
palfrey merged 6 commits into
TraceMachina:mainfrom
erneestoc:ec/pr2243-output-upload-correctness
May 20, 2026
Merged

worker: fix two silent output-corruption bugs (absolute symlinks + zero-digest files)#2346
palfrey merged 6 commits into
TraceMachina:mainfrom
erneestoc:ec/pr2243-output-upload-correctness

Conversation

@erneestoc
Copy link
Copy Markdown
Contributor

@erneestoc erneestoc commented May 18, 2026

Description

Two correctness fixes on the worker's output-upload and input-materialization paths. Both are silent-corruption bugs — actions complete "successfully" but the output artifacts the client receives are wrong or missing.

What's fixed

Absolute symlinks in output upload (running_actions_manager.rs::inner_upload_results)

When a Bazel action emits an absolute symlink as one of its declared outputs (e.g. output_link → /usr/lib/libfoo.so or → /tmp/nativelink/cache/...), the pre-fix worker uploads the literal target path string to the AC. The client (or a different worker that downloads the AC result) follows that symlink on its own filesystem and finds nothing — the worker-local path doesn't exist on the requester.

The fix branches on Path::is_absolute() in the symlink handler. Absolute targets are resolved — directory targets get uploaded as Tree protos via upload_directory + serialize_and_upload_message, file targets via upload_file. Relative symlinks fall through to the existing upload_symlink codepath unchanged (relative symlinks travel correctly with the tree).

This aligns the actual worker behavior with our advertised symlink_absolute_path_strategy: Disallowed in CacheCapabilities.

Zero-digest files missing from worker exec dirs (FilesystemStore::get_file_entry_for_digest)

The pre-fix function returned a synthetic FileEntry for the zero digest pointing at a content path that never exists on disk (zero-byte files aren't stored — nothing to store). Downstream hardlink calls failed silently — the zero-byte file went missing from the action's output tree. Swift's incremental compile produces empty .o files routinely; same for .swiftdoc placeholders.

Fix: return Code::NotFound. Callers must take the explicit empty-file path (fs::create_file + write_all(&[])). Companion change in download_to_directory wraps the zero-digest branch with err_tip so any future failures are diagnosable rather than silent.

Provenance

Equivalent to upstream commits 08071533b and 19d7e2052 from #2243, ported atomically to current main. A third upstream commit (cf801c2b, fix spawn_upload_to_remote missing output dir blobs) was intentionally not ported because it patches a function that doesn't exist on main; that bug is latent until the deferred fast/slow upload split lands.

Non-overlap with #2338

PR #2338 (8051ca9e) fixed zero-byte handling in the DirectoryCache (input-fetch materialization path). This PR fixes the related bug in FilesystemStore + running_actions_manager (output-materialization + worker exec-dir path). Different stores, different code paths, different test files.

Note on backward compatibility

Not a breaking change in the semver sense (no API surface change, no config surface change). However, the wire shape of ActionResult changes for actions that produce absolute symlinks: those entries now appear in output_files / output_folders (resolved content) rather than output_file_symlinks (worker-local target). This brings actual behavior into compliance with our advertised symlink_absolute_path_strategy: Disallowed. Clients depending on the prior (contract-violating) behavior will see different output entries; the byte semantics are equivalent.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • cargo build -p nativelink-store -p nativelink-worker clean
  • cargo test -p nativelink-store --test filesystem_store_test — 22 passed
  • cargo test -p nativelink-worker --test running_actions_manager_test — 33 passed
  • cargo clippy -p nativelink-store --lib --tests -- -D warnings clean (2 pre-existing clippy errors on main in newly-merged r2_store.rs confirmed reproducible on plain origin/main)
  • cargo clippy -p nativelink-worker --lib -- -D warnings clean
  • cargo fmt --check clean for both packages

New tests:

  • upload_absolute_symlink_resolves_contents — symlink-to-file resolution branch
  • upload_absolute_symlink_to_directory_uploads_tree — symlink-to-directory branch, asserts Tree proto upload + nested blob in CAS
  • download_to_directory_zero_digest_empty_file_test — extended with sibling + nested-in-subdir zero-digest files to exercise the recursive caller path
  • get_file_entry_for_zero_digest_returns_not_found — FilesystemStore API contract

Updated upload_dir_and_symlink_test — retains the ln -s /dev/null empty_sym fixture; assertion updated to expect the now-resolved-to-empty-file behavior (canonical empty digest e3b0c44.../0).

Checklist

  • Updated documentation if needed (no public API change; in-code doc comments updated)
  • Tests added/amended
  • bazel test //... passes locally (verified via cargo only; bazel-aspect clippy clean for changed files per grep against clippy.toml disallowed-methods)
  • PR is contained in a single commit (multiple atomic commits + review-feedback follow-ups; happy to squash before merge)

This change is Reviewable

erneestoc and others added 3 commits May 18, 2026 15:45
…argets

When the worker's work directory is populated via DirectoryCache, output
paths can be absolute symlinks pointing into the cache directory
(/var/.../directory_cache/...). The previous output collection code
uploaded these as raw SymlinkInfo with absolute targets that are
meaningless on the Bazel client, causing "No such file or directory"
errors when the client materialised the action result.

Detect absolute symlinks in output paths and resolve them: upload
directory contents as Tree protos and file contents as regular files.
Relative symlinks (intentionally created by the action) are still
preserved as symlinks.

Updates upload_dir_and_symlink_test to use a relative symlink (the
previous /dev/null absolute symlink is now resolved, breaking the old
assertion) and adds a new upload_absolute_symlink_resolves_contents
regression test that verifies an out-of-tree payload reachable via an
absolute symlink lands in CAS as a file.

Ported from upstream
TraceMachina/nativelink@0807153 (PR
TraceMachina#2243).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FilesystemStore::get_file_entry_for_digest previously returned a
synthetic FileEntry for zero-digest blobs pointing to a content_path
that never exists on disk (FilesystemStore deliberately never persists
zero-byte files). Downstream worker output-materialisation code that
took the prefetched hardlink path would try to hard_link from this
non-existent source, fail, and either silently produce a missing file
or fall back without enough context to diagnose the failure.

This is the OUTPUT-materialisation companion to PR TraceMachina#2338's DirectoryCache
zero-byte fix, which lives on the input-fetch path. The two changes are
complementary: TraceMachina#2338 covers the directory cache short-circuit; this
covers the FilesystemStore API and the running_actions_manager fallback.

Changes:
- FilesystemStore::get_file_entry_for_digest now returns
  Code::NotFound for zero digests instead of a synthetic FileEntry,
  forcing callers to materialise empty files via fs::create_file rather
  than hard_linking from a phantom source.
- running_actions_manager::download_to_directory adds err_tip context
  on the zero-digest fs::create_file / write_all so a failure here is
  attributable to the empty-file path.
- Updates the existing get_file_entry_for_zero_digest test to assert
  the new NotFound behaviour and renames it to
  get_file_entry_for_zero_digest_returns_not_found.
- Adds download_to_directory_zero_digest_empty_file_test that
  exercises an empty file declared inside an input directory and
  verifies it materialises on disk with zero bytes. This test
  intentionally does NOT collide with PR TraceMachina#2338's DirectoryCache
  zero-byte test (which validates a different short-circuit path).

Ported from upstream
TraceMachina/nativelink@19d7e20 (PR
TraceMachina#2243).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit of the two ported commits (e57b675b, b1dc3a1c) flagged two
correctness-relevant gaps. Both are now closed with strict assertions so
the underlying bugs cannot silently regress.

1. upload_absolute_symlink_to_directory_uploads_tree
   The original e57b675b regression test covers absolute-symlink-to-file
   but not absolute-symlink-to-directory, which exercises an entirely
   separate code path (upload_directory + Tree proto serialisation). The
   new test creates an out-of-tree directory containing inner.txt,
   absolute-symlinks it into the work dir, runs the action, and:
   - asserts output_directory_symlinks and output_file_symlinks are
     BOTH empty (the symlink must NOT be preserved),
   - asserts output_folders contains exactly one entry at the symlink
     path, proving Tree-upload happened,
   - walks the uploaded Tree, locates inner.txt, fetches its blob from
     CAS and verifies the content. This proves the directory was
     actually traversed and uploaded — not stubbed.

2. download_to_directory_zero_digest_empty_file_test (strengthened)
   Extended the existing single-file test to cover:
   - a second sibling zero-digest file (proves the path is not a
     single-file fluke),
   - a zero-digest file nested inside a subdirectory (proves the
     recursive download_to_directory caller also handles NotFound from
     get_file_entry_for_digest — not just the top-level invocation),
   - strict per-file assertions: is_file, !is_symlink, len == 0, and
     a read-back that confirms the file is actually empty rather than
     a phantom dirent.

Verdict on the dropped checks:
- Dangling absolute symlink: source returns OutputType::None which is
  the correct graceful behaviour, but writing a test for it requires
  fabricating a broken symlink mid-action and is comparatively low
  value; left uncovered intentionally.
- File mode on zero-digest materialisation: the worker uses
  fs::create_file with no explicit mode, so the result follows umask
  and would be brittle across CI environments. Length + type
  assertions are the stronger invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread nativelink-worker/tests/running_actions_manager_test.rs Outdated
Comment thread nativelink-worker/tests/running_actions_manager_test.rs
Comment thread nativelink-worker/tests/running_actions_manager_test.rs Outdated
erneestoc added 2 commits May 19, 2026 10:38
Two fixes for the new tests in upload_dir_and_symlink_test and the
absolute-symlink tests:

1. Restore the absolute-symlink case in upload_dir_and_symlink_test.
   The previous commit replaced `empty_sym -> /dev/null` (absolute) with
   `rel_sym -> dir1/file` (relative), dropping coverage for the absolute
   branch in that integration test. Re-add `empty_sym` pointing to an
   out-of-tree zero-byte payload we create explicitly. This exercises
   both the relative-symlink-preserved path (via `rel_sym`) and the
   new absolute-symlink-resolved path (via `empty_sym`) without
   relying on `/dev/null`'s character-device semantics.

   `empty_sym` now appears in `output_files` (resolved, zero bytes,
   sha256 = e3b0c44...) rather than `output_file_symlinks`.

2. Replace the factually-wrong "Windows does not support symlinks"
   comments on the two new tests with an accurate explanation:
   Windows supports symlinks via std::os::windows::fs::symlink_file /
   symlink_dir, but the fixtures use `ln -s`, which is the Unix-only
   reason these tests are gated to non-Windows targets.
…_test

Previous iterations of this PR sidestepped `/dev/null` by swapping the
absolute-symlink fixture for either a relative symlink (lost coverage)
or an out-of-tree real empty file (added scaffolding to dodge a corner
case we should just test directly).

Restore the original `ln -s /dev/null empty_sym` fixture. The post-fix
worker code resolves the absolute symlink via `fs::metadata` (follows),
sees a character device (`is_dir() == false`), and falls into the
"upload as file" branch. `upload_file` opens `/dev/null` and reads to
EOF — `/dev/null`'s contract is that reads return 0 bytes immediately
— producing the canonical sha256 empty digest (e3b0c44…) with size 0.

That's well-defined, harmless behavior and a real-world Bazel pattern
(rules sometimes use `ln -sf /dev/null x` to create empty outputs).
Test now locks in this contract directly, no scaffolding required.

Drops the `external_root` / `external_file` setup since it was only
needed to avoid `/dev/null`.
@erneestoc erneestoc requested a review from palfrey May 19, 2026 18:08
@palfrey palfrey merged commit 74aa906 into TraceMachina:main May 20, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants