Skip to content

ci: gate images on the system closure, not the OCI tar#1332

Merged
Andrew Gazelka (andrewgazelka) merged 1 commit into
mainfrom
worktree-image-checks-toplevel
Jun 18, 2026
Merged

ci: gate images on the system closure, not the OCI tar#1332
Andrew Gazelka (andrewgazelka) merged 1 commit into
mainfrom
worktree-image-checks-toplevel

Conversation

@andrewgazelka

@andrewgazelka Andrew Gazelka (andrewgazelka) commented Jun 18, 2026

Copy link
Copy Markdown
Member

What

The per-image CI checks (image-* in .#ciChecks) built the full OCI tar
(streamLayeredImageoci-image-builder<name>-oci.tar). That tar pack
(serialize + compress the whole NixOS closure, then the layer-efficiency pass)
is ~60-100s per image and dominated CI wall-clock.

This repoints the checks at the image's system closure
(config.system.build.toplevel) instead of the tar. The tar is still the
package output, built only at release where a registry push actually consumes
the bytes.

  • lib/image/oci-layer.nix: expose passthru.toplevel on the image derivation.
  • lib/per-system.nix: imageChecks build .toplevel for Nix images. Non-Nix
    example images (mkNonNixImage, a pulled Debian/Ubuntu base) have no Nix
    toplevel, so their check stays the assembled archive.

Why

The gate's only signal from an image check is "this image's closure builds." It
does not boot the VM or test any behavior, and the OCI assembly itself is
deterministic plumbing. So building the tar in the gate spends ~60-100s/image to
re-prove something the per-package checks already cover.

Worse, the closure includes frequently-changing packages — notably the
base-profile mcp (in every image). Editing any mcp file changes every
image's closure, so all ~15 tars re-packed on essentially every commit. Verified
on a Linux builder: a one-line comment in packages/mcp/ix_notebook_mcp/runtime.py
changed the image-minecraft derivation hash, and a darwin-cask / prompt-only PR
re-packed all 15 images in CI.

Building the closure is a relink over already-built store paths: ~2s vs
~100s for the tar (measured on a warm Linux builder after an mcp edit).

Impact

This removes the tar-pack cascade from the gate. Image-touching PRs (and the
merge-queue run) drop from ~3.5-5 min to roughly the eval floor (~1.5-2 min),
about 2-3x. It does not touch eval cost (genuinely uncacheable across
commits) or the release/registry path.

Validation

On a Linux builder against this branch:

  • Flake evaluates; all image-* checks present; no check-name collisions.
  • .#ciChecks.x86_64-linux.image-minecraft.drvPath now resolves to
    nixos-system-...drv (the closure), not minecraft-oci.tar.drv.
  • .#packages.x86_64-linux still carries the image tar derivations (release
    path intact).

Authored with Claude (Opus). Review the eval-cost follow-up separately;
caching can't reduce the remaining eval floor.

Note

Gate image CI checks on the NixOS system closure instead of the OCI archive

  • Adds passthru.toplevel to the OCI layer derivation in oci-layer.nix, exposing the NixOS system closure as a separate build target.
  • Updates imageChecks in per-system.nix so Nix image checks depend on v.toplevel (the closure) rather than v (the OCI tar). Non-Nix example image checks are unchanged.
  • Behavioral Change: CI no longer builds the full OCI archive to gate image checks for Nix images — only the system closure is required.

Macroscope summarized ec04fc2.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions

Copy link
Copy Markdown
Contributor

Blast radius

23 of 1478 checks would rebuild between base 934991b and head ab7c15c.

pie showData title Rebuilt checks by category
  "image" : 15
  "rust" : 2
  "site" : 2
  "agent" : 1
  "blast" : 1
  "eval" : 1
  "lint" : 1
Loading
flowchart LR
  c0["ix-mcp"]
  c1["blast-radius-test"]
  c2["agent-skills"]
  c3["lint"]
  c4["site-test"]
  c5["site-case-tests"]
  c0 --> k0["agent-skills"]
  c0 --> k2["eval"]
  c0 --> k3["image-development-base"]
  c0 --> k4["image-kernel-dev"]
  c0 --> k5["image-minecraft"]
Loading
changed checks (23)
  • agent-skills
  • blast-radius-test
  • eval
  • image-development-base
  • image-kernel-dev
  • image-minecraft
  • image-minecraft-bedrock
  • image-minecraft-status
  • image-minecraft_1.21.11-fabric
  • image-minecraft_1.21.11-paper
  • image-minecraft_26.1.2-fabric
  • image-minecraft_26.1.2-paper
  • image-minecraft_26w17a-fabric
  • image-minestom
  • image-neovim-ci
  • image-remote-desktop
  • image-symphony-codex
  • image-test-cluster-bootstrap
  • lint
  • rust-mcp.evalSmoke
  • rust-mcp.requirementsSmoke
  • site-case-tests
  • site-test

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review found issues in this pull request.

Verdict: patch is incorrect
Confidence: 0.87

The patch removes PR CI coverage for building Nix OCI image archives and their efficiency checks, leaving only system-closure builds for discovered images.

  • P1 lib/per-system.nix:1079 Nix image checks no longer build the OCI archive

Comment thread lib/per-system.nix
@andrewgazelka

Copy link
Copy Markdown
Member Author

Intentional trade-off, accepted: the whole point of this PR is to keep the ~150s/image OCI tar build out of PR CI (it dominates flake-check, ~2400s CPU). Gating on v.toplevel still realizes the system closure and keeps blast-radius's drvPath signal, and the archive derivation is still built when an image is actually pushed/deployed (and buildable on demand via nix build .#packages.<image>). The PR-CI coverage gap for streamLayeredImage/layer-efficiency is the deliberate cost of the speedup. Resolving.

(sent by an AI agent via Claude Code)

@andrewgazelka Andrew Gazelka (andrewgazelka) added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit 876298b Jun 18, 2026
11 of 14 checks passed
@andrewgazelka Andrew Gazelka (andrewgazelka) deleted the worktree-image-checks-toplevel branch June 18, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant