WIP: make nix flake check use the eval cache for derivation outputs by andrewgazelka · Pull Request #1 · indexable-inc/nix

Andrew Gazelka (andrewgazelka) · 2026-05-31T04:36:11Z

Status: WIP / draft. Compiles, correct on passing and failing flakes, and the eval-cache speedup is demonstrated (numbers below). Opened as a draft so review happens through a protected-branch PR.

What

CmdFlakeCheck::run (src/nix/flake.cc) carried // FIXME: rewrite to use EvalCache. for ~5 years and force-evaluated every output on every run, so a repeat nix flake check on an unchanged committed tree paid the full eval cost again, unlike nix build / nix flake show, which read the eval cache via eval_cache::AttrCursor.

This routes the heavy derivation-bearing outputs (checks, packages, devShells, formatter) through the eval cache (isDerivation() + forceDerivation()), and lets the existing value-based traversal handle the cheap structural outputs (apps, overlays, modules, …), which it skips for the names the cache handled (cachedOutputs).

Benchmark

Built from this branch (meson/ninja), nix 2.35, aarch64-darwin.

Small flake (3 nixpkgs derivations + a check), committed tree, nix flake check:

run	time	cache
cold (cache cleared)	6.9s	populates
warm (run 2)	0.35s	`using cached attribute`=10, `evaluating uncached`=0

~20× on the warm run; identical derivations checked; exit parity with stock.

Large flake (indexable-inc/index, ~140 outputs incl. cross-system OCI closures): cold full check is ~20–25 min (intrinsic eval cost, unrelated to this change); the warm cold→warm number from a no-cache-wipe run is being measured and will be appended here.

Important caveat (inherent nix eval-cache limitation, not this patch)

The speedup applies only to a repeat check on an unchanged, committed tree. A dirty working tree — or any new commit — changes the flake fingerprint, so the cache misses and the run is cold again (same as nix build/flake show). So the winners are CI re-checking the same commit, or running the check twice without changes; the active edit→check loop is unchanged.

Correctness validation

✅ Passing flake: cold then warm, identical results, warm fully cache-served.
✅ Broken output (throw): shows the real error on cold and warm, with and without --keep-going (cached failure of attribute … never leaks). Fixed by catching CachedEvalError in the cursor path and calling e.force() to re-surface the real error, mirroring src/nix/main.cc.
✅ --keep-going: a malformed/failed system-level attrset is reported and skipped, not fatal (cursor traversal wrapped in try/reportError).
✅ Dirty tree: no fingerprint → cursors fall through to full eval → same behaviour as stock.

Why hybrid, not a full rewrite

Forcing a Value is the cost, so the traversal must be cursor-based; you cannot bolt caching onto the existing forceValue loop. The cursor API serves derivation checks cleanly but not structural ones (an apps type/program shape, a 2-arg overlays function, importable modules), which still need a real Value via AttrCursor::forceValue().

Follow-ups before upstreamable

Cursor-path errors lose source positions (addTrace(nullptr, …), noPos → at «none»); the eval cache exposes no PosIdx. Decide: extend the cache, or accept/document. (This was the sticking point on prior attempts, Cache evaluation for eval and flake check NixOS/nix#4279, Flake schemas NixOS/nix#8892.)
make_ref<flake::LockedFlake>(flake) deep-copies; hold flake as a ref like CmdFlakeShow.
Functional tests: cache-hit on 2nd run; error parity on a broken output run twice; build phase still builds the right drvs.
reportError's throw e; on a const Error& slices; minor, fix when adding tests.

Refs: indexable-inc/index#405, NixOS#4279, NixOS#8892.

Made with Claude Code (Opus 4.8).

Andrew Gazelka (andrewgazelka) · 2026-05-31T04:44:35Z

Adversarial review (max-effort, blank-context agent)

Generated by Claude Code (Opus 4.8) as an independent reviewer; relayed verbatim.

Bottom line: approach is sound, diff is small and reviewable. Dirty-tree fallback is correct (null fingerprint → null AttrDb → cursors fall through to forceValue), and the build phase records the same DerivedPaths as stock. One substantive regression plus missing tests are the blockers.

Blockers

B1 — warm-run errors for broken cached outputs are opaque. The eval cache stores failures (failed_t, committed on AttrDb destruction), so the design premise "a cache hit means it previously succeeded" is wrong. On the 2nd run a broken packages/checks/devShells/formatter output throws CachedEvalError ("cached failure of attribute ...") instead of the real error. Stock nix build/flake show recover the real message by letting CachedEvalError reach src/nix/main.cc (e.force() re-eval); this PR defeats that twice: (a) checkDerivationCursor catches Error& locally, and (b) reportError's throw e; on a const Error& object-slices the CachedEvalError so main.cc no longer matches it. Stays red (no false green) but the message regresses on every warm run. Fix: don't catch CachedEvalError in the cursor path (or dynamic_cast + rethrow unmodified; for --keep-going, call e.force() yourself to log the real error). Also fix the reportError slicing.

B2 — no tests. Add functional tests: cache-hit on 2nd run (assert via -vvv "using cached attribute" / NIX_ALLOW_EVAL=0); error parity on a broken output run twice (B1), with and without --keep-going; build phase still builds the right drvs; a committed non-derivation packages.<sys>.x = "foo" case to exercise the isDerivation()==false path under a populated cache.

Major

M3 — getAttrs() in perSystemDrvs is outside any try, so a malformed packages.<sys> or a cached-failed level aborts the whole run even under --keep-going. Stock wraps each output/leaf. Fix: wrap the per-system/per-leaf cursor traversal in try/reportError.

M4 — blanket enableImportFromDerivation.setDefault(true) in the cache block (early, global, no hydraJobs carve-out). Confirmed it's not a new failure (stock re-flips IFD true for non-hydraJobs outputs too, and setDefault respects user --option), but it's imprecise under --no-build. Scope it per-output or guard on build.

Minor / nits

m5 source positions lost on the cursor path (addTrace(nullptr,...), noPos) → at «none». Inherent to the cursor API (no PosIdx); decide whether to extend the cache or document. This has been the sticking point on prior attempts (Cache evaluation for eval and flake check NixOS/nix#4279, Flake schemas NixOS/nix#8892).
m6 make_ref<flake::LockedFlake>(flake) deep-copies the whole LockedFlake; hold flake as a ref like CmdFlakeShow and *flake for callFlake.
m7 double callFlake on cold runs (cache rootLoader + legacy block); modest overhead, note in benchmark.
m8 cursor isDerivation() is shallower than legacy getDerivation (won't catch a broken name/assertion at eval time); matches nix build semantics, document it.
n9 lingering half-done FIXME; n10 inline formatter block vs a perSystemDrv helper; n11 cachedOutputs set vs inline name check.

Thread-safety

reportError/hasErrors/attrPathsByDrv captured-by-ref and mutated from the cursor lambdas; fine single-threaded, would need synchronization if flake check is ever parallelized. Add a comment.

Top 3 before upstreamable

B1 — stop swallowing CachedEvalError (+ fix reportError slicing) so warm runs of broken flakes still show the real error.
B2/M3 — add cache-hit + error-parity tests; wrap getAttrs() traversals so --keep-going is honored.
m5/m6/M4 — decide the position-info story; drop the LockedFlake copy; tighten the IFD default.

…WIP) Route checks/packages/devShells/formatter through eval-cache attribute cursors (isDerivation/forceDerivation) so a repeat `nix flake check` on an unchanged committed tree is served from ~/.cache/nix/eval-cache-*, like nix build and nix flake show already are. Cheap structural outputs keep the existing value-based path. Addresses the long-standing FIXME in CmdFlakeCheck. See flake-check-eval-cache-DESIGN.md. Refs indexable-inc/index#405, NixOS#4279. Made with Claude Code (Opus 4.8).

Andrew Gazelka (andrewgazelka) · 2026-05-31T06:08:35Z

Benchmark note — large flake (indexable-inc/index)

I couldn't capture a clean cold→warm number on the large flake, and it's worth saying why rather than posting a bogus figure. The index flake's full check is ~25 min cold, and the repo was being actively committed to during that window (the macos-vm PR landed mid-run), so the store snapshot/fingerprint shifted and the cold leg hit an unrelated eval error (lib.fileset.unions: … packages/macos-vm/Cargo.toml … does not exist — a file not yet in the snapshot). An erroring run doesn't commit the eval cache, so the "warm" leg re-ran cold. That's a property of benchmarking a 25-min check on a moving target, not of this change.

The small-flake numbers above stand as the proof (cold 6.9s → warm 0.35s, fully cache-served, identical derivations). The mechanism is identical at scale: after one clean cold run on a committed, unchanging tree, the warm run is served from the cache. The honest caveat from the PR body still applies — the win is for repeat checks at the same commit; any new commit is cold again.

Andrew Gazelka (andrewgazelka) mentioned this pull request May 31, 2026

RFC: nix flake check is slow and uncached — adopt nix-fast-build, upstream the eval-cache fix, or fork Nix? indexable-inc/index#405

Closed

Andrew Gazelka (andrewgazelka) force-pushed the codex/flake-check-eval-cache branch from 4301c89 to 9541de3 Compare May 31, 2026 05:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: make nix flake check use the eval cache for derivation outputs#1

WIP: make nix flake check use the eval cache for derivation outputs#1
Andrew Gazelka (andrewgazelka) wants to merge 1 commit into
masterfrom
codex/flake-check-eval-cache

Andrew Gazelka (andrewgazelka) commented May 31, 2026 •

edited

Loading

Uh oh!

Andrew Gazelka (andrewgazelka) commented May 31, 2026

Uh oh!

Andrew Gazelka (andrewgazelka) commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andrew Gazelka (andrewgazelka) commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Benchmark

Important caveat (inherent nix eval-cache limitation, not this patch)

Correctness validation

Why hybrid, not a full rewrite

Follow-ups before upstreamable

Uh oh!

Andrew Gazelka (andrewgazelka) commented May 31, 2026

Adversarial review (max-effort, blank-context agent)

Blockers

Major

Minor / nits

Thread-safety

Top 3 before upstreamable

Uh oh!

Andrew Gazelka (andrewgazelka) commented May 31, 2026

Benchmark note — large flake (indexable-inc/index)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrew Gazelka (andrewgazelka) commented May 31, 2026 •

edited

Loading