Skip to content

No safe concurrent build cache: disk exhaustion from worktree isolation and no artifact lifecycle #16804

@jroth1111

Description

@jroth1111

No safe concurrent build cache: disk exhaustion from worktree isolation and no artifact lifecycle

Problem

Cargo has two independent gaps that compound under concurrent use:

Problem A: No safe concurrent build cache. Multiple Cargo instances cannot share build artifacts safely. The per-target-directory lock serializes builds, and race conditions corrupt artifacts when builds overlap. The only safe workaround — per-worktree isolation via CARGO_TARGET_DIR — eliminates sharing entirely.

Problem B: No artifact lifecycle. Every cargo build creates 2–10 GB of intermediate files. The final binary is typically 30–100 MB. The other 95–99% is scaffolding that serves no purpose after the build succeeds. Cargo has no mechanism to delete any of it, and no eviction policy bounds accumulation.

These problems are independent but compound: isolation (the workaround for Problem A) multiplies accumulation (Problem B) by the number of concurrent build sessions.

This hurts anyone who:

  • Works on multiple Rust projects — each project accumulates its own full target/ directory, growing indefinitely
  • Uses git worktrees or multiple branches — each worktree needs its own target/, and sharing one is unsafe (corruption, lock contention), so disk usage multiplies by the number of active branches
  • Runs CI or automated builds — build agents fill disk over time with no automatic cleanup
  • Runs parallel builds — Cargo's per-directory locking serializes concurrent builds and risks artifact corruption when they overlap

Agent-driven development is a growing use case that exposes these long-standing gaps. The underlying problems affect all users who use worktrees, CI, or multiple projects — agent workflows just make the failure mode acute. Each agent session operates on its own git worktree, runs cargo check / cargo test / cargo clippy independently, and treats target/ as pure build output. The problem is volume and multiplication, not misuse.

Observed impact: 457 disk-exhaustion events, 3,703 compilation failures from artifact corruption, and 22,958 instances where agents had to manually isolate target/ directories to avoid concurrent access failures — across ~40 Rust projects observed over 4,440 sessions.

What accumulates

Every build produces a full set of intermediate artifacts:

File type Purpose Size per crate Needed after build?
.rlib Static library archive 1–50 MB Only if rebuilding dependents
.rmeta Metadata for separated compilation 0.1–5 MB Only if recompiling dependents
.o Object code (intermediate) 0.1–10 MB No — already linked
.pdb / .dSYM Debug symbols 1–100 MB Only if debugging
.fingerprint/* Build cache metadata Small Only if incrementally rebuilding
incremental/* Incremental compilation state 10–500 MB Only if incrementally rebuilding

Artifacts from previous toolchain versions, different feature sets, or removed dependencies accumulate indefinitely — Cargo has no mechanism to prune them. Native dependencies (boring-sys, candle-onnx) amplify the problem: their build.rs scripts compile C/C++ and produce even larger intermediates, and their build failures often leave partial artifacts behind.

The Catch-22

These failures are tolerable in isolation — a single developer working in one branch can absorb a 5 GB target/ directory, and occasional cargo clean is an acceptable cost. The problem becomes severe when multiple Cargo instances run concurrently. Parallel worktrees face two bad options:

  • Share a target/ → Cargo's per-directory lock serializes builds, and race conditions corrupt artifacts (.rmeta/.rlib files overwritten while another rustc is reading them)
  • Isolate with separate CARGO_TARGET_DIR per worktree → no corruption, but each worktree gets a full copy of all dependency artifacts, multiplying disk usage without bound

There is no third option in Cargo today. A shared build cache with safe concurrent access does not exist.

The multiplier formula: N worktrees × M toolchains × K profiles × (2–10 GB per instance). For 6 worktrees × 2 toolchains × 2 profiles: 48–240 GB of build artifacts — mostly duplicate compilations of the same dependencies. A developer with 3 active Rust projects, each running 4 parallel agent sessions, is carrying 72–480 GB of Cargo intermediates at any given time.

Reproduction

# Clone a medium-sized workspace (e.g., one using serde, syn, tokio)
git clone https://github.com/<example-workspace> repro && cd repro

# Create 5 worktrees on different branches
for branch in feat-a feat-b feat-c feat-d feat-e; do
  git worktree add ../repro-$branch "$branch"
done

# === Reproduce corruption from shared target/ ===
# Terminal 1 (main):
cargo build 2>&1 | tee /tmp/build-main.log

# Terminal 2 (feat-a, sharing the same target/):
cargo build 2>&1 | tee /tmp/build-feat-a.log
# Observe: "extern location for <crate> does not exist:
#   target/debug/deps/lib<crate>-<hash>.rmeta"
# One build overwrites .rmeta/.rlib files another is reading.

# === Reproduce lock contention from shared target/ ===
# Run both simultaneously from different worktrees using the same target/:
# Observe: "Blocking waiting for file lock on build directory"
# One lock (.cargo-lock, .cargo-build-lock) covers every package,
# profile, and target triple. Parallel builds serialize.

# === Reproduce disk exhaustion from isolation workaround ===
# Each worktree gets its own target dir to avoid corruption:
for branch in feat-a feat-b feat-c feat-d feat-e; do
  (cd ../repro-$branch && CARGO_TARGET_DIR=target-$branch cargo build) &
done
wait

# Observe disk multiplication:
du -sh target-*
# 6 worktrees × ~5 GB each = ~30 GB of mostly duplicate dependency artifacts

# === Observe cascading failure ===
# Fill disk with isolated builds, then attempt a build in any project:
df -h /  # check available space
# Next build fails with:
# "error: No space left on device (os error 28)"
# "rustc-LLVM ERROR: IO failure on output stream"

Why sccache doesn't solve this

sccache is the most obvious existing tool for shared compilation caching, but it doesn't address the core problems:

  • No Cargo dependency graph awareness. sccache caches individual rustc invocations by hashing inputs, but doesn't understand Cargo's .fingerprint state or dependency resolution. It can't coordinate which artifacts are stale vs. current.
  • No incremental compilation support. sccache doesn't manage incremental/ directories or .fingerprint/ state — it's a compiler-wrapper cache, not a build-system cache.
  • Latency overhead. Cache lookup + deserialization adds 50–200ms per rustc invocation. For cargo check in a tight edit loop, this is unacceptable.
  • Different source = different hash. Two worktrees with different source code for the same crate produce different sccache keys. They can't share artifacts by design — sccache is for identical compilations, not for "same dependency, different dependents."

sccache solves "don't recompile identical inputs" across CI builds. It does not solve "multiple divergent source trees sharing a dependency cache safely."

Observed impact

Data from 4,440 session logs across 105 projects (~40+ Rust projects), totaling 16 GB of session data and 1.37M tool outputs:

Failure categories (click to expand)
Category Count
CARGO_TARGET_DIR isolation overrides 25,900
Retry/rebuild loops 14,199
File lock blocking (serialized builds) 6,065
"could not compile" errors (all causes) 3,703
Preflight build checks (build, detect corruption, clean, retry) 1,529
Incremental compilation corruption 1,029
rm -rf target (manual recovery) 999
Disk exhaustion (No space left on device) 457
cargo clean (reactive recovery) 418
Custom native build failures (boring-sys, candle-onnx) 287
Extern location missing (artifact vanished mid-compile) 127
Permission denied on target/debug/deps/ 48
Toolchain mismatch (stale incremental state) 35
SIGSEGV in toolchain (mostly llvm-cov export crashes) 25
Duplicate lang item (mixed stable/nightly toolchains) 20
Proc-macro dylib staleness (dangling pointer) 7
Per-worktree disk accumulation example (click to expand)

A single project with 26 worktrees accumulated 105 GB of build artifacts:

Directory Size
worktrees/branch-a/target 28 GB
worktrees/branch-b/target 19 GB
worktrees/branch-c/target 12 GB
worktrees/branch-d/target 9.7 GB
worktrees/branch-e/target 5.9 GB
main target/ 12 GB
Cargo commands invoked by agents (click to expand)
Command Count Role
cargo test 2,468 Most common — the primary verification step
cargo check 1,126 Fast verification before full build
cargo build 1,125 Standard compilation
cargo clippy 441 Lint checking
cargo clean 103 Reactive recovery, never proactive

Users built a resilience layer

The strongest evidence that this is a gap, not a missing feature: users built error detection and recovery infrastructure that Cargo should provide.

Artifact corruption detection — a regex heuristic that distinguishes transient corruption from real code errors, a distinction Cargo itself does not expose:

has_transient_cargo_failure() {
  rg -q \
    "extern location .* does not exist|No such file or directory \(os error 2\)|\
     could not compile|Blocking waiting for file lock on build directory|\
     Blocking waiting for file lock on package cache|\
     Blocking waiting for file lock on artifact directory" \
    "$log_file"
}

Retry-with-clean loop — observed 1,529 times:

preflight_build_validation_binaries() {
  local max_attempts="${CARGO_BUILD_MAX_ATTEMPTS:-2}"
  for ((attempt = 1; attempt <= max_attempts; attempt++)); do
    cargo test --quiet --no-run ... 2>&1 | tee "$build_log"
    if [[ "$rc" -eq 0 ]]; then return 0; fi
    if has_transient_cargo_failure "$build_log"; then
      cargo clean && sleep 1 && continue
    fi
    return "$rc"
  done
}

Per-worktree isolation — observed 22,958 times, the single most common workaround:

target="$target_root/${safe_branch}"
CARGO_TARGET_DIR="$target" cargo check ...

The available cleanup tools can't solve this:

  • cargo clean is all-or-nothing — destroys incremental compilation state along with intermediates
  • cargo-sweep (third-party) uses time-based heuristics, not semantic knowledge of what's final vs. transient
  • Automatic GC (-Zgc, Rust 1.88) handles the global download cache (~/.cargo/registry/, ~/.cargo/git/), not build artifacts in target/

Root causes

  1. No safe concurrent access. The lock mechanism (.cargo-lock, .cargo-build-lock) is per-target-directory and designed for single-process serial builds. Fine-grained locking (#16155) allows shared reads but doesn't address write contention and is not stable. There is no concurrency model for divergent source trees sharing a build cache.

  2. No artifact lifecycle. There is no code path in Cargo that deletes an intermediate artifact after a successful build. The export_dir mechanism in compilation_files.rs (line 129) copies final artifacts when --artifact-dir is used — purely additive, originals untouched. The OutputFile struct tracks where files were copied (export_path) but not what to delete afterward. The automatic GC system (gc.rs) handles only ~/.cargo/registry/ and ~/.cargo/git/, not build artifacts in target/. The fingerprint system (fingerprint/mod.rs) tracks whether crates need recompiling, not whether artifacts are safe to delete.

  3. No transient failure taxonomy. Cargo surfaces the same error messages whether the code is wrong or the artifact cache got corrupted. No Cargo issue exists for this gap — it is unrecognized.

  4. No disk awareness. Cargo has no concept of disk budget, no mechanism to monitor accumulated artifact size, and no coordination between concurrent Cargo instances consuming shared disk. It fills the disk and then fails with an error it could have prevented.

  5. Brittle incremental state. .fingerprint/ and incremental/ are invalidated by toolchain changes (hash includes rustc version), but the error surfaces only after a failed build. The user sees consider running cargo clean first rather than Cargo proactively invalidating the cache. (fingerprint/mod.rs line 668, DirtyReason::RustcChanged)

  6. No build-and-extract code path. The build.build-dir split already distinguishes final outputs from intermediates in the directory layout — the code knows which files are which. But no code path acts on this distinction to clean up after a build. (compilation_files.rs line 129, cargo_clean.rs)

Relationship to existing work

Complementary work that doesn't close these gaps:

Solution Status What it does What remains
build.build-dir (#14125) Stable Separates intermediates from outputs Doesn't delete intermediates; doesn't prevent multiplication
Automatic GC (#12633) Rust 1.88 Cleans global download cache Does NOT clean build artifacts in target/
v2 build-dir layout (#15010) Testing Per-package artifact paths Enables per-package GC but doesn't implement it
--artifact-dir (#6790) Nightly Copies binaries to a directory Doesn't delete intermediates; additive only
Fine-grained locking (#16155) Merged Shared read locks on build dir Doesn't address write contention; not stable
Move build-dir to global cache (#16147) Open Intermediates in ~/.cache/cargo/ Blocked on build script migration (#13663)

Does not address the problem:

Solution Why
cargo clean --workspace (#14720) Manual, all-or-nothing, reactive
split-debuginfo Separates debug info but intermediates still accumulate
Shared artifact cache (#5931) About sharing across projects, not cleaning
Compression of intermediates (#16462) Reduces size but keeps everything; treats symptom
Auto-clean stale deps (#6435) One specific accumulation path, not the systemic issue
sccache Compiler-wrapper cache; no Cargo dependency graph awareness, no incremental support, can't share across divergent source trees

What's missing

A solved system would:

  1. Retire artifacts automatically. After a build completes, intermediate files are cleaned up (or evicted under a budget).
  2. Support concurrent builds safely. Multiple Cargo instances can share a build cache without corrupting each other.
  3. Stay within disk budget. Accumulated artifacts never exceed a configurable limit; Cargo evicts old data proactively.
  4. Distinguish transient from real failures. When a build fails because artifacts got corrupted, Cargo says so explicitly.

None of this exists in Cargo today, and no combination of currently planned features delivers all four properties simultaneously.

Four specific gaps are unrecognized or unscoped:

  1. No concurrency model for divergent source trees. Fine-grained locking (#16155) allows shared read access — but was designed for the same project building the same dependencies, not for multiple divergent source trees (different worktrees with different modifications) sharing one cache. The v2 per-package layout (#15010) is a prerequisite for safe concurrent access, but the concurrency protocol itself hasn't been designed. Nobody is working on the "multiple divergent source trees" case.

  2. No transient failure taxonomy. Cargo surfaces the same error messages whether the code is wrong or the artifact cache got corrupted. Users had to build has_transient_cargo_failure() as a regex heuristic to distinguish them. This is an unrecognized gap — no Cargo issue exists for it.

  3. The structural fix is blocked. Moving build-dir out of the project tree to a global cache location (#16147) would solve the multiplication problem structurally — intermediates would live in ~/.cache/cargo/ instead of per-worktree target/ directories. But it's blocked on build script migration (#13663), because existing build.rs scripts hardcode paths inside target/. This is the single change that would address the root cause, and it can't ship yet.

  4. No disk budget or eviction policy. GC exists for individual artifacts (#12633, shipped in Rust 1.88) but whole-target-directory GC (#13136) and configurable size limits (#13062) are still open. Even when they ship, they address eventual cleanup, not proactive budget enforcement — a build can still exhaust disk before GC runs.


Proposed solutions

These are ordered by impact and independence — each can ship separately.

1. Transient failure taxonomy (near-term, high impact)

The most actionable improvement. Add a diagnostic category for artifact-corruption failures so users (and tools) can distinguish them from real compilation errors.

Concrete changes:

  • New error code or diagnostic hint when rustc reports a missing .rmeta/.rlib that Cargo's fingerprint system expected to exist — distinct from source-level compile errors
  • Structured error output (--message-format=json) with a "reason": "artifact_missing" field, separate from "reason": "compile_error"
  • A cargo doctor subcommand (or cargo build --check-health) that detects stale fingerprints, missing artifacts, and toolchain mismatches before attempting a build

This directly eliminates the need for the has_transient_cargo_failure() regex heuristic and the 1,529 preflight-check workarounds. It's shippable independently of the concurrency or cleanup work.

2. cargo gc --target-dir (near-term, medium impact)

Extend the existing GC infrastructure (#12633) to operate on target/, not just ~/.cargo/registry/:

# Remove artifacts older than 7 days, keeping final outputs
cargo gc --target-dir ./target --older-than 7d --keep-final

# Enforce a size budget with LRU eviction
cargo gc --target-dir ./target --max-size 10G

This is a smaller, more conservative proposal than --cleanup and aligns with the Cargo team's existing GC infrastructure. It solves accumulation without destroying incremental builds. It does not solve concurrency but makes the isolation workaround (per-worktree CARGO_TARGET_DIR) tolerable by bounding disk usage.

3. cargo build --cleanup (CI-focused)

A build mode that, on successful compilation, extracts the final outputs and discards intermediates. This is a CI/final-artifact feature, not an interactive workflow flag.

cargo build --cleanup          # CLI flag
cargo test --cleanup           # after tests pass, discard intermediates

Or via config:

[build]
post-build-cleanup = true

What it does on success:

  1. Identify final outputs — binaries, cdylibs, staticlibs. The build.build-dir layout already separates these from intermediates.
  2. Extract debug symbols (if split-debuginfo is configured). DWARF/dSYM/PDB data is separated and preserved alongside the binary.
  3. Delete intermediate artifacts that are not needed for incremental rebuilds: .o files, incremental/ state, build-script outputs.
  4. Preserve .rlib/.rmeta for dependencies (crates.io/git deps) so the next incremental build doesn't require a full recompilation.

What it does not do:

  • Does not delete on failure (intermediates needed for diagnosis)
  • Does not delete .rlib/.rmeta for dependencies (needed for incremental rebuilds)
  • Does not delete across all target directories (scoped to the current build)
  • Does not require nightly (unlike --artifact-dir)

Cost of cleanup on incremental builds: With post-build-cleanup = true, a one-line change in a workspace with 200 dependencies requires recompiling only the changed crate (~5 seconds), because dependency .rlib/.rmeta files are preserved. Only the workspace's own intermediates (.o, incremental/) are cleaned. This is acceptable for both CI and interactive development.

Why this matters for the concurrency problem: If each build session cleans up after itself, the disk multiplication from per-worktree isolation is bounded. Today, N worktrees × 5 GB grows without limit. With post-build cleanup, each worktree's footprint collapses after the build completes:

Before:  N worktrees × 2–10 GB = unbounded
After:   N worktrees × (30–100 MB final outputs + dependency .rlibs) = bounded

This doesn't solve the concurrency problem (you still need per-worktree isolation to avoid corruption), but it removes the catastrophic consequence of isolation — disk exhaustion.

4. Shared content-addressed build cache (long-term, solves concurrency)

The design space for a shared, content-addressed Cargo build cache — the "third option" that doesn't exist today:

  • Content addressing: Hash (crate_source, dependencies, compiler_flags, rustc_version) → artifact key. Two worktrees building the same dependency with the same compiler produce the same key → share the artifact.
  • Separation from target/: Shared cache lives in ~/.cache/cargo/build-cache/. Local target/ contains only references.
  • Concurrency: Content-addressed stores are naturally concurrent — two writers producing the same key produce the same file, so atomic rename is sufficient. Different keys don't contend. This is how Bazel's remote cache and sccache work.
  • Eviction: LRU by last-access time, with configurable size budget.

This requires the v2 per-package layout (#15010) as a prerequisite and is blocked on build script migration (#13663). But it's the only proposal that solves the concurrency problem structurally — eliminating the need for per-worktree isolation entirely.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions