Skip to content

WIP: make nix flake check use the eval cache for derivation outputs#1

Draft
Andrew Gazelka (andrewgazelka) wants to merge 1 commit into
masterfrom
codex/flake-check-eval-cache
Draft

WIP: make nix flake check use the eval cache for derivation outputs#1
Andrew Gazelka (andrewgazelka) wants to merge 1 commit into
masterfrom
codex/flake-check-eval-cache

Conversation

@andrewgazelka

@andrewgazelka Andrew Gazelka (andrewgazelka) commented May 31, 2026

Copy link
Copy Markdown
Member

Status: WIP / draft. Compiles, correct on passing and failing flakes, and the eval-cache speedup is demonstrated (numbers below). Opened as a draft so review happens through a protected-branch PR.

What

CmdFlakeCheck::run (src/nix/flake.cc) carried // FIXME: rewrite to use EvalCache. for ~5 years and force-evaluated every output on every run, so a repeat nix flake check on an unchanged committed tree paid the full eval cost again, unlike nix build / nix flake show, which read the eval cache via eval_cache::AttrCursor.

This routes the heavy derivation-bearing outputs (checks, packages, devShells, formatter) through the eval cache (isDerivation() + forceDerivation()), and lets the existing value-based traversal handle the cheap structural outputs (apps, overlays, modules, …), which it skips for the names the cache handled (cachedOutputs).

Benchmark

Built from this branch (meson/ninja), nix 2.35, aarch64-darwin.

Small flake (3 nixpkgs derivations + a check), committed tree, nix flake check:

run time cache
cold (cache cleared) 6.9s populates
warm (run 2) 0.35s using cached attribute=10, evaluating uncached=0

~20× on the warm run; identical derivations checked; exit parity with stock.

Large flake (indexable-inc/index, ~140 outputs incl. cross-system OCI closures): cold full check is ~20–25 min (intrinsic eval cost, unrelated to this change); the warm cold→warm number from a no-cache-wipe run is being measured and will be appended here.

Important caveat (inherent nix eval-cache limitation, not this patch)

The speedup applies only to a repeat check on an unchanged, committed tree. A dirty working tree — or any new commit — changes the flake fingerprint, so the cache misses and the run is cold again (same as nix build/flake show). So the winners are CI re-checking the same commit, or running the check twice without changes; the active edit→check loop is unchanged.

Correctness validation

  • ✅ Passing flake: cold then warm, identical results, warm fully cache-served.
  • ✅ Broken output (throw): shows the real error on cold and warm, with and without --keep-going (cached failure of attribute … never leaks). Fixed by catching CachedEvalError in the cursor path and calling e.force() to re-surface the real error, mirroring src/nix/main.cc.
  • --keep-going: a malformed/failed system-level attrset is reported and skipped, not fatal (cursor traversal wrapped in try/reportError).
  • ✅ Dirty tree: no fingerprint → cursors fall through to full eval → same behaviour as stock.

Why hybrid, not a full rewrite

Forcing a Value is the cost, so the traversal must be cursor-based; you cannot bolt caching onto the existing forceValue loop. The cursor API serves derivation checks cleanly but not structural ones (an apps type/program shape, a 2-arg overlays function, importable modules), which still need a real Value via AttrCursor::forceValue().

Follow-ups before upstreamable

  • Cursor-path errors lose source positions (addTrace(nullptr, …), noPosat «none»); the eval cache exposes no PosIdx. Decide: extend the cache, or accept/document. (This was the sticking point on prior attempts, Cache evaluation for eval and flake check NixOS/nix#4279, Flake schemas NixOS/nix#8892.)
  • make_ref<flake::LockedFlake>(flake) deep-copies; hold flake as a ref like CmdFlakeShow.
  • Functional tests: cache-hit on 2nd run; error parity on a broken output run twice; build phase still builds the right drvs.
  • reportError's throw e; on a const Error& slices; minor, fix when adding tests.

Refs: indexable-inc/index#405, NixOS#4279, NixOS#8892.


Made with Claude Code (Opus 4.8).

@andrewgazelka

Copy link
Copy Markdown
Member Author

Adversarial review (max-effort, blank-context agent)

Generated by Claude Code (Opus 4.8) as an independent reviewer; relayed verbatim.

Bottom line: approach is sound, diff is small and reviewable. Dirty-tree fallback is correct (null fingerprint → null AttrDb → cursors fall through to forceValue), and the build phase records the same DerivedPaths as stock. One substantive regression plus missing tests are the blockers.

Blockers

B1 — warm-run errors for broken cached outputs are opaque. The eval cache stores failures (failed_t, committed on AttrDb destruction), so the design premise "a cache hit means it previously succeeded" is wrong. On the 2nd run a broken packages/checks/devShells/formatter output throws CachedEvalError ("cached failure of attribute ...") instead of the real error. Stock nix build/flake show recover the real message by letting CachedEvalError reach src/nix/main.cc (e.force() re-eval); this PR defeats that twice: (a) checkDerivationCursor catches Error& locally, and (b) reportError's throw e; on a const Error& object-slices the CachedEvalError so main.cc no longer matches it. Stays red (no false green) but the message regresses on every warm run. Fix: don't catch CachedEvalError in the cursor path (or dynamic_cast + rethrow unmodified; for --keep-going, call e.force() yourself to log the real error). Also fix the reportError slicing.

B2 — no tests. Add functional tests: cache-hit on 2nd run (assert via -vvv "using cached attribute" / NIX_ALLOW_EVAL=0); error parity on a broken output run twice (B1), with and without --keep-going; build phase still builds the right drvs; a committed non-derivation packages.<sys>.x = "foo" case to exercise the isDerivation()==false path under a populated cache.

Major

M3 — getAttrs() in perSystemDrvs is outside any try, so a malformed packages.<sys> or a cached-failed level aborts the whole run even under --keep-going. Stock wraps each output/leaf. Fix: wrap the per-system/per-leaf cursor traversal in try/reportError.

M4 — blanket enableImportFromDerivation.setDefault(true) in the cache block (early, global, no hydraJobs carve-out). Confirmed it's not a new failure (stock re-flips IFD true for non-hydraJobs outputs too, and setDefault respects user --option), but it's imprecise under --no-build. Scope it per-output or guard on build.

Minor / nits

  • m5 source positions lost on the cursor path (addTrace(nullptr,...), noPos) → at «none». Inherent to the cursor API (no PosIdx); decide whether to extend the cache or document. This has been the sticking point on prior attempts (Cache evaluation for eval and flake check NixOS/nix#4279, Flake schemas NixOS/nix#8892).
  • m6 make_ref<flake::LockedFlake>(flake) deep-copies the whole LockedFlake; hold flake as a ref like CmdFlakeShow and *flake for callFlake.
  • m7 double callFlake on cold runs (cache rootLoader + legacy block); modest overhead, note in benchmark.
  • m8 cursor isDerivation() is shallower than legacy getDerivation (won't catch a broken name/assertion at eval time); matches nix build semantics, document it.
  • n9 lingering half-done FIXME; n10 inline formatter block vs a perSystemDrv helper; n11 cachedOutputs set vs inline name check.

Thread-safety

reportError/hasErrors/attrPathsByDrv captured-by-ref and mutated from the cursor lambdas; fine single-threaded, would need synchronization if flake check is ever parallelized. Add a comment.

Top 3 before upstreamable

  1. B1 — stop swallowing CachedEvalError (+ fix reportError slicing) so warm runs of broken flakes still show the real error.
  2. B2/M3 — add cache-hit + error-parity tests; wrap getAttrs() traversals so --keep-going is honored.
  3. m5/m6/M4 — decide the position-info story; drop the LockedFlake copy; tighten the IFD default.

…WIP)

Route checks/packages/devShells/formatter through eval-cache attribute
cursors (isDerivation/forceDerivation) so a repeat `nix flake check` on an
unchanged committed tree is served from ~/.cache/nix/eval-cache-*, like
nix build and nix flake show already are. Cheap structural outputs keep the
existing value-based path. Addresses the long-standing FIXME in CmdFlakeCheck.

See flake-check-eval-cache-DESIGN.md. Refs indexable-inc/index#405, NixOS#4279.

Made with Claude Code (Opus 4.8).
@andrewgazelka

Copy link
Copy Markdown
Member Author

Benchmark note — large flake (indexable-inc/index)

I couldn't capture a clean cold→warm number on the large flake, and it's worth saying why rather than posting a bogus figure. The index flake's full check is ~25 min cold, and the repo was being actively committed to during that window (the macos-vm PR landed mid-run), so the store snapshot/fingerprint shifted and the cold leg hit an unrelated eval error (lib.fileset.unions: … packages/macos-vm/Cargo.toml … does not exist — a file not yet in the snapshot). An erroring run doesn't commit the eval cache, so the "warm" leg re-ran cold. That's a property of benchmarking a 25-min check on a moving target, not of this change.

The small-flake numbers above stand as the proof (cold 6.9s → warm 0.35s, fully cache-served, identical derivations). The mechanism is identical at scale: after one clean cold run on a committed, unchanging tree, the warm run is served from the cache. The honest caveat from the PR body still applies — the win is for repeat checks at the same commit; any new commit is cold again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant