Add OCI image support: pull, unpack, run, prune, status, policy#34
Add OCI image support: pull, unpack, run, prune, status, policy#34Max042004 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
11 issues found across 40 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/oci/pull.c">
<violation number="1" location="src/oci/pull.c:253">
P2: Error-path leak: `sub_resp` may be allocated but not freed when sub-manifest fetch fails before `have_sub` is set.</violation>
</file>
<file name="src/oci/media-type.c">
<violation number="1" location="src/oci/media-type.c:100">
P2: Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.</violation>
</file>
<file name="src/oci/ref.c">
<violation number="1" location="src/oci/ref.c:83">
P2: Repository-path validation incorrectly rejects valid names with repeated dashes (for example `my--repo`).</violation>
<violation number="2" location="src/oci/ref.c:356">
P2: `docker.io` default-namespace detection is case-sensitive, so mixed-case hostnames can skip the required `library/` prefix.</violation>
</file>
<file name="src/oci/fetch.c">
<violation number="1" location="src/oci/fetch.c:782">
P2: Manifest fetch skips bearer-challenge parsing when a token is already cached, so 401 responses from expired/stale tokens are not retried with a refreshed token.</violation>
<violation number="2" location="src/oci/fetch.c:945">
P2: Blob fetch also disables challenge parsing when a token is cached, preventing 401-triggered token refresh and causing avoidable pull failures.</violation>
</file>
<file name="src/oci/blob-store.c">
<violation number="1" location="src/oci/blob-store.c:354">
P2: The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.</violation>
</file>
<file name="src/oci/store.c">
<violation number="1" location="src/oci/store.c:285">
P2: Fsync the pin directory after `rename` to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.</violation>
</file>
<file name="src/oci/manifest.c">
<violation number="1" location="src/oci/manifest.c:295">
P2: `schemaVersion` parsing can accept fractional JSON numbers because `valueint` is used without an integer round-trip check.</violation>
<violation number="2" location="src/oci/manifest.c:385">
P2: Layer descriptor memory is leaked on post-parse validation failures because `nlayers` is incremented too late.</violation>
<violation number="3" location="src/oci/manifest.c:481">
P2: Index descriptor memory leaks when platform parsing fails because `nentries` is incremented after the fallible parse.</violation>
</file>
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic
| return OCI_MT_UNKNOWN; | ||
|
|
||
| for (size_t i = 0; i < MEDIA_TYPE_COUNT; i++) { | ||
| if (!strcmp(MEDIA_TYPES[i].name, buf)) |
There was a problem hiding this comment.
P2: Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/media-type.c, line 100:
<comment>Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.</comment>
<file context>
@@ -0,0 +1,189 @@
+ return OCI_MT_UNKNOWN;
+
+ for (size_t i = 0; i < MEDIA_TYPE_COUNT; i++) {
+ if (!strcmp(MEDIA_TYPES[i].name, buf))
+ return MEDIA_TYPES[i].kind;
+ }
</file context>
| } else { | ||
| return false; | ||
| } | ||
| if (i >= len || !is_lower_alnum(s[i])) |
There was a problem hiding this comment.
P2: Repository-path validation incorrectly rejects valid names with repeated dashes (for example my--repo).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/ref.c, line 83:
<comment>Repository-path validation incorrectly rejects valid names with repeated dashes (for example `my--repo`).</comment>
<file context>
@@ -0,0 +1,429 @@
+ } else {
+ return false;
+ }
+ if (i >= len || !is_lower_alnum(s[i]))
+ return false;
+ }
</file context>
| return -1; | ||
| } | ||
|
|
||
| if (link(w->tmp_path, final_path) < 0) { |
There was a problem hiding this comment.
P2: The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/blob-store.c, line 354:
<comment>The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.</comment>
<file context>
@@ -0,0 +1,399 @@
+ return -1;
+ }
+
+ if (link(w->tmp_path, final_path) < 0) {
+ if (errno != EEXIST) {
+ int saved = errno;
</file context>
| *err_msg = "close on pin tmp file failed"; | ||
| return -1; | ||
| } | ||
| if (rename(tmp, path) < 0) { |
There was a problem hiding this comment.
P2: Fsync the pin directory after rename to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/store.c, line 285:
<comment>Fsync the pin directory after `rename` to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.</comment>
<file context>
@@ -0,0 +1,360 @@
+ *err_msg = "close on pin tmp file failed";
+ return -1;
+ }
+ if (rename(tmp, path) < 0) {
+ int saved = errno;
+ unlink(tmp);
</file context>
| if (parse_descriptor(desc, &out->layers[out->nlayers], err_msg) < 0) | ||
| goto fail; | ||
| oci_media_type_t lmt = out->layers[out->nlayers].media_type; | ||
| if (!oci_media_type_is_layer(lmt)) { | ||
| set_parse_err(err_msg, | ||
| "manifest layer has non-layer media type"); | ||
| goto fail; | ||
| } | ||
| if (oci_media_type_is_foreign(lmt)) { | ||
| set_parse_err(err_msg, | ||
| "manifest references foreign (nondistributable) " | ||
| "layer; not supported"); | ||
| goto fail; | ||
| } | ||
| if (!oci_media_type_is_layer_supported(lmt)) { | ||
| set_parse_err(err_msg, | ||
| "manifest layer media type is not supported " | ||
| "(only tar / tar+gzip / tar+zstd)"); | ||
| goto fail; | ||
| } | ||
| out->nlayers++; |
There was a problem hiding this comment.
P2: Layer descriptor memory is leaked on post-parse validation failures because nlayers is incremented too late.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/manifest.c, line 385:
<comment>Layer descriptor memory is leaked on post-parse validation failures because `nlayers` is incremented too late.</comment>
<file context>
@@ -0,0 +1,707 @@
+ set_parse_err(err_msg, "manifest layer entry is not an object");
+ goto fail;
+ }
+ if (parse_descriptor(desc, &out->layers[out->nlayers], err_msg) < 0)
+ goto fail;
+ oci_media_type_t lmt = out->layers[out->nlayers].media_type;
</file context>
| if (parse_descriptor(desc, &out->layers[out->nlayers], err_msg) < 0) | |
| goto fail; | |
| oci_media_type_t lmt = out->layers[out->nlayers].media_type; | |
| if (!oci_media_type_is_layer(lmt)) { | |
| set_parse_err(err_msg, | |
| "manifest layer has non-layer media type"); | |
| goto fail; | |
| } | |
| if (oci_media_type_is_foreign(lmt)) { | |
| set_parse_err(err_msg, | |
| "manifest references foreign (nondistributable) " | |
| "layer; not supported"); | |
| goto fail; | |
| } | |
| if (!oci_media_type_is_layer_supported(lmt)) { | |
| set_parse_err(err_msg, | |
| "manifest layer media type is not supported " | |
| "(only tar / tar+gzip / tar+zstd)"); | |
| goto fail; | |
| } | |
| out->nlayers++; | |
| oci_descriptor_t *slot = &out->layers[out->nlayers]; | |
| if (parse_descriptor(desc, slot, err_msg) < 0) | |
| goto fail; | |
| out->nlayers++; | |
| oci_media_type_t lmt = slot->media_type; | |
| if (!oci_media_type_is_layer(lmt)) { | |
| set_parse_err(err_msg, | |
| "manifest layer has non-layer media type"); | |
| goto fail; | |
| } | |
| if (oci_media_type_is_foreign(lmt)) { | |
| set_parse_err(err_msg, | |
| "manifest references foreign (nondistributable) " | |
| "layer; not supported"); | |
| goto fail; | |
| } | |
| if (!oci_media_type_is_layer_supported(lmt)) { | |
| set_parse_err(err_msg, | |
| "manifest layer media type is not supported " | |
| "(only tar / tar+gzip / tar+zstd)"); | |
| goto fail; | |
| } |
| *err_msg = type_msg; | ||
| return -1; | ||
| } | ||
| *out = item->valueint; |
There was a problem hiding this comment.
P2: schemaVersion parsing can accept fractional JSON numbers because valueint is used without an integer round-trip check.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/manifest.c, line 295:
<comment>`schemaVersion` parsing can accept fractional JSON numbers because `valueint` is used without an integer round-trip check.</comment>
<file context>
@@ -0,0 +1,707 @@
+ *err_msg = type_msg;
+ return -1;
+ }
+ *out = item->valueint;
+ return 0;
+}
</file context>
There was a problem hiding this comment.
3 issues found across 131 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/oci/media-type.c">
<violation number="1" location="src/oci/media-type.c:100">
P2: Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.</violation>
</file>
<file name="src/oci/ref.c">
<violation number="1" location="src/oci/ref.c:83">
P2: Repository-path validation incorrectly rejects valid names with repeated dashes (for example `my--repo`).</violation>
</file>
<file name="src/oci/fetch.c">
<violation number="1" location="src/oci/fetch.c:782">
P2: Manifest fetch skips bearer-challenge parsing when a token is already cached, so 401 responses from expired/stale tokens are not retried with a refreshed token.</violation>
</file>
<file name="src/oci/blob-store.c">
<violation number="1" location="src/oci/blob-store.c:354">
P2: The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.</violation>
</file>
<file name="src/oci/store.c">
<violation number="1" location="src/oci/store.c:285">
P2: Fsync the pin directory after `rename` to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.</violation>
</file>
<file name="src/oci/manifest.c">
<violation number="1" location="src/oci/manifest.c:295">
P2: `schemaVersion` parsing can accept fractional JSON numbers because `valueint` is used without an integer round-trip check.</violation>
<violation number="2" location="src/oci/manifest.c:385">
P2: Layer descriptor memory is leaked on post-parse validation failures because `nlayers` is incremented too late.</violation>
</file>
<file name="docs/usage.md">
<violation number="1" location="docs/usage.md:135">
P2: Contradictory documentation for `--user`. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic `name`, `name:group`, reading /etc/passwd and /etc/group). These cannot both be correct.</violation>
</file>
<file name="src/oci/inspect.h">
<violation number="1" location="src/oci/inspect.h:57">
P3: The `suppress_layer_reuse` comment is inverted and documents the opposite runtime behavior, which can cause callers to pass the wrong value.</violation>
</file>
<file name="externals/zstd/VENDORING.md">
<violation number="1" location="externals/zstd/VENDORING.md:12">
P3: The file references 'oci-roadmap.md', which does not exist in the codebase. Remove the broken reference or update it to point to the actual document containing the policy commitment.</violation>
</file>
Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Re-trigger cubic
| | `-e KEY=VAL`, `--env KEY=VAL` | Set or replace one env var (repeatable) | | ||
| | `-e KEY`, `--env KEY` | Import `KEY` from the host environ (repeatable) | | ||
| | `-w DIR`, `--workdir DIR` | Override image WorkingDir | | ||
| | `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) | |
There was a problem hiding this comment.
P2: Contradictory documentation for --user. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic name, name:group, reading /etc/passwd and /etc/group). These cannot both be correct.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/usage.md, line 135:
<comment>Contradictory documentation for `--user`. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic `name`, `name:group`, reading /etc/passwd and /etc/group). These cannot both be correct.</comment>
<file context>
@@ -99,6 +99,179 @@ and memory access, and per-thread inspection. Implementation details, including
+| `-e KEY=VAL`, `--env KEY=VAL` | Set or replace one env var (repeatable) |
+| `-e KEY`, `--env KEY` | Import `KEY` from the host environ (repeatable) |
+| `-w DIR`, `--workdir DIR` | Override image WorkingDir |
+| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) |
+| `--keep` | Keep the per-run cloned rootfs after exit |
+| `--name NAME` | Reserved: deterministic clone-dir suffix (ignored today) |
</file context>
| | `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) | | |
| | `-u UID[:GID]`, `--user UID[:GID]` | Override image User (supports numeric UID[:GID] or symbolic name[:group]) | |
| /* When true (default), render a "layer reuse:" section after the | ||
| * manifest layer table. Setting this to false suppresses the section | ||
| * entirely (useful for tests that only want to verify the renderer | ||
| * baseline without dedup compute side-effects). The CLI never sets | ||
| * this to false. | ||
| */ |
There was a problem hiding this comment.
P3: The suppress_layer_reuse comment is inverted and documents the opposite runtime behavior, which can cause callers to pass the wrong value.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/inspect.h, line 57:
<comment>The `suppress_layer_reuse` comment is inverted and documents the opposite runtime behavior, which can cause callers to pass the wrong value.</comment>
<file context>
@@ -45,9 +46,21 @@ typedef struct {
+ * convention. Pure information: dedup metrics never write to disk.
+ */
+ const char *volume_root;
+ /* When true (default), render a "layer reuse:" section after the
+ * manifest layer table. Setting this to false suppresses the section
+ * entirely (useful for tests that only want to verify the renderer
</file context>
| /* When true (default), render a "layer reuse:" section after the | |
| * manifest layer table. Setting this to false suppresses the section | |
| * entirely (useful for tests that only want to verify the renderer | |
| * baseline without dedup compute side-effects). The CLI never sets | |
| * this to false. | |
| */ | |
| /* When false (default), render a "layer reuse:" section after the | |
| * manifest layer table. Setting this to true suppresses the section | |
| * entirely (useful for tests that only want to verify the renderer | |
| * baseline without dedup compute side-effects). The CLI never sets | |
| * this to true. | |
| */ |
|
|
||
| ## Why vendored, decode-only | ||
|
|
||
| `oci-roadmap.md` Q9 commits the OCI work to hand-rolled C: no Go, no Rust, |
There was a problem hiding this comment.
P3: The file references 'oci-roadmap.md', which does not exist in the codebase. Remove the broken reference or update it to point to the actual document containing the policy commitment.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At externals/zstd/VENDORING.md, line 12:
<comment>The file references 'oci-roadmap.md', which does not exist in the codebase. Remove the broken reference or update it to point to the actual document containing the policy commitment.</comment>
<file context>
@@ -0,0 +1,72 @@
+
+## Why vendored, decode-only
+
+`oci-roadmap.md` Q9 commits the OCI work to hand-rolled C: no Go, no Rust,
+no `cargo` / `go` in the build matrix. zstd is the only OCI-spec layer
+compression beyond gzip that has wide registry support, and the upstream
</file context>
jserv
left a comment
There was a problem hiding this comment.
Rebase onto the latest main branch and squash/rework the commits into fewer, cleaner ones.
Introduce the `elfuse oci` subcommand surface with the first two operations needed to retrieve and read an OCI image without leaving the local store: - `oci ref` parsing -- host[:port]/repo[:tag|@digest], docker.io default-namespace handling - SHA-256 digester + content-addressable blob store (sha256:<hex>/<tail> on-disk layout, tmp+rename commit) - manifest, image-index, image-config parsers (cJSON-backed) - HTTPS registry client (libcurl): anonymous fetch + bearer-token WWW-Authenticate challenge handling - private-registry options: basic auth, custom CA bundle, loopback-only --insecure - local pin store + `oci pull` pipeline driving the registry round trips (top-level fetch, index recurse, config fetch, layer fetch, pin write) - offline `oci inspect` renderer that walks the local manifest tree without touching the network Vendors externals/cjson (MIT, v1.7.18) for JSON parsing. Wires the oci/ subdirectory into the build and adds five test-oci-* native-host unit tests for the new modules.
Turn a pulled image into something `elfuse` can actually execute: - vendor decode-only zstd v1.5.6 (compression / dictBuilder / legacy paths excluded; only oci/decompress.c includes the header) - tar reader: ustar + GNU long-name records, used by layer apply - decompression dispatch: gzip via system zlib, zstd via vendored decode-only build, dispatched by layer media type - layer applier with whiteout-aware merge: typeflag '1' (hardlink), '2' (symlink), '5' (directory), `.wh.*` markers, symlink-escape containment - per-image sysroot on a case-sensitive APFS sparsebundle (hdiutil-provisioned, image_layout v1) - per-run rootfs via clonefile(2) on top of the sysroot - `oci unpack` and `oci clone` subcommands that exercise the above - `oci inspect` extended with the image-config runtime block (Entrypoint / Cmd / Env / WorkingDir / User) - runspec resolver merging image-config defaults with CLI Entrypoint / Cmd / Env / WorkingDir / User overrides - PATH resolver that walks the guest /usr/local/sbin..:/sbin chain inside the sysroot (no host PATH leakage) - `elfuse_launch` extraction from main.c so the elfuse runtime can be reused by both legacy ./binary mode and the new `oci run` - `oci run` subcommand that ties pull -> unpack -> clone -> launch - `oci-layout` 1.0.0 marker at the store root - migrate store pins from `refs/<name>` flat files to a single `index.json` (OCI image-layout 1.0); auto-migrate on store open
Round out the store with garbage collection, caching, and a faster pull path: - origin sidecar attached to each unpacked image tree so the GC walker can attribute layer blobs back to their owning image - root-set walker that joins image trees to blob digests - mark-and-sweep `oci prune` with `--older-than` and `--keep-bytes` - per-layer raw-tar snapshot cache (APFS clonefile) so re-unpacking the same layer reuses the previous extracted tree - ChainID-keyed stack snapshot cache that materializes a full layer-stack tree in one clonefile when the chain has been seen before - `layers/` schema marker v2 + auto-migration from legacy v1 (legacy v1 entries wiped; blobs and image trees untouched) - raw-tar layer apply mode used to populate the per-layer cache - unpack orchestrator rewritten on raw + ChainID stack caches - `oci rebuild-cache` for back-filling stack snapshots on stores that were created before the cache existed - cross-image dedup metrics in `oci inspect` (layer-reuse %, bytes saved) - `oci status` (text + `--json`) summarizing blobs / layers / stacks / pinned images - `oci pull --refresh` to revalidate the top-level manifest against the registry without re-downloading unchanged layers - parallel blob fetch via curl_multi - HTTP Range resume for partial blob downloads - per-blob progress callback + TTY / non-TTY renderers - podman / skopeo-style `policy.json` schema and loader (default, per-transport, per-repository rules) - `policy.json` plumbed into fetch and the `oci pull` CLI - `registries.d/*` overlay merged with policy (per-registry insecure / ca_bundle / auth_file); CLI flags still win
Make `oci run` work against real public images (alpine, busybox, python, ruby, debian) and lock the surface down with end-to-end fixtures. Runtime surface: - writable clone-rootfs DoD: the per-run rootfs is writable out of the box, so guests that mutate /tmp, /var, /run work unchanged - runtime files injection: /etc/resolv.conf, /etc/hosts, /etc/hostname populated from the host into the clone-rootfs - /dev/full and /dev/console emulation in the syscall layer - /proc surface: cgroup, hostname, comm, statm entries that glibc startup and procps tooling read - image-config `User` symbolic resolution: name and name:group forms looked up against the guest /etc/passwd and /etc/group before falling back to numeric - `oci run` walks the image index to the linux/arm64 leaf manifest (Phase 3 fix; previously fed the top-level index to the config-loader and crashed on multi-arch images) Bug fixes uncovered by cold-cache runs: - layer apply no longer rejects the root tar entry "./" - unpack stages files via copyfile(2) with COPYFILE_CLONE fallback so cross-volume unpack (store on internal SSD, sysroot on the APFS sparsebundle) succeeds - tar reader handles PAX 'x' / 'g' extended-header `path` and `linkpath` records (busybox and python:alpine layers use them) Compat tests: - `tests/test-oci-compat.sh` shell smoke (in-tree fixtures) - `OCI_COMPAT_TEST=1` heavy mode that provisions a scratch sparsebundle and drives three fixtures end-to-end: alpine-shaped, busybox-shaped hardlink dispatch, two-layer whiteout - `OCI_FETCH_ONLINE=1` alpine:3 end-to-end smoke (opt-in; requires network) `ELFUSE_OCI_PROGRESS=plain` env disables the pull progress in-place CSI redraw for terminals that don't honor cursor-up escapes (issue surfaced on legacy Terminal.app panes). Documentation: `docs/oci.md` Phase 4 runtime surface and libc-adjacent envelope notes (what guests can / can't expect from the synthetic /etc, /dev, /proc).
jserv
left a comment
There was a problem hiding this comment.
Refine per review messages from cubic.
There was a problem hiding this comment.
3 issues found across 144 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="externals/cjson/VENDORING.md">
<violation number="1" location="externals/cjson/VENDORING.md:1">
P3: Incorrect release date for cJSON v1.7.18. The upstream release date is 2024-05-13, not 2024-07-30. This is a documentation inaccuracy.</violation>
</file>
<file name="src/oci/volume-list.c">
<violation number="1" location="src/oci/volume-list.c:114">
P2: Handle `readdir()` errors explicitly; otherwise a directory read failure is silently treated as EOF and can return an incomplete volume list.</violation>
</file>
<file name="src/oci/tar.h">
<violation number="1" location="src/oci/tar.h:40">
P2: Expose borrowed tar entry strings as `const char *` to prevent callers from mutating reader-owned memory.</violation>
</file>
Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Re-trigger cubic
| size_t cap = 0; | ||
| struct dirent *de; | ||
| int rc = 0; | ||
| while ((de = readdir(dp)) != NULL) { |
There was a problem hiding this comment.
P2: Handle readdir() errors explicitly; otherwise a directory read failure is silently treated as EOF and can return an incomplete volume list.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/volume-list.c, line 114:
<comment>Handle `readdir()` errors explicitly; otherwise a directory read failure is silently treated as EOF and can return an incomplete volume list.</comment>
<file context>
@@ -0,0 +1,161 @@
+ size_t cap = 0;
+ struct dirent *de;
+ int rc = 0;
+ while ((de = readdir(dp)) != NULL) {
+ const char *name = de->d_name;
+ if (!name_is_sha256_dir(name))
</file context>
| * the next oci_tar_next call. Callers that need to keep either past | ||
| * the next iteration must duplicate the strings themselves. | ||
| */ | ||
| char *path; |
There was a problem hiding this comment.
P2: Expose borrowed tar entry strings as const char * to prevent callers from mutating reader-owned memory.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/tar.h, line 40:
<comment>Expose borrowed tar entry strings as `const char *` to prevent callers from mutating reader-owned memory.</comment>
<file context>
@@ -0,0 +1,87 @@
+ * the next oci_tar_next call. Callers that need to keep either past
+ * the next iteration must duplicate the strings themselves.
+ */
+ char *path;
+ char *linkname;
+ uint64_t size;
</file context>
| @@ -0,0 +1,35 @@ | |||
| # Vendored cJSON | |||
There was a problem hiding this comment.
P3: Incorrect release date for cJSON v1.7.18. The upstream release date is 2024-05-13, not 2024-07-30. This is a documentation inaccuracy.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At externals/cjson/VENDORING.md:
<comment>Incorrect release date for cJSON v1.7.18. The upstream release date is 2024-05-13, not 2024-07-30. This is a documentation inaccuracy.</comment>
<file context>
@@ -0,0 +1,35 @@
+# Vendored cJSON
+
+This directory contains a vendored copy of [cJSON](https://github.com/DaveGamble/cJSON),
+the ultralightweight JSON parser written in ANSI C. cJSON ships as a single
+`.c` / `.h` pair and is dual-licensed under the MIT license (see `LICENSE`).
+
+## Why vendored
+
+`oci-roadmap.md` Q9 commits Phase 1 to hand-rolled C alongside the existing
</file context>
This PR lands the full elfuse OCI image support. It supersedes the
original Phase 1 scope of this PR (CLI scaffold + pull/inspect) and
now covers Phases 1-4 plus the post-Phase-3 improvements plan: image
layout alignment, GC/prune, layer + stack snapshot caches, store
status, parallel pull, registry policy.json, and a heavy-mode compat
matrix.
Scope
token, OCI index walk to the linux/arm64 leaf manifest, partial-
store-aware inspect renderer.
x/grecords), gzip + decode-only vendored zstd, whiteout-aware layer apply (typeflag '1'/'2'/'5'
.wh.*markers), per-image sysroot on a case-sensitive APFSsparsebundle.
elfuse oci runclones the unpacked tree via clonefile(2),honors Entrypoint / Cmd / Env / WorkingDir / User, and reuses the
existing elfuse launch path so a dynamically-linked guest binary
runs through the same shim + syscall surface as the non-OCI mode.
oci prunewith--older-than/--keep-bytes;layer + stack prune sweep;
oci status(text +--json);oci rebuild-cachefor pre-snapshot stores.per-layer raw snapshot cache; ChainID stack snapshot cache; APFS
COW clone-rootfs reuse between runs.
policy.json+registries.doverlay (per-registry insecure / ca_bundle / auth_file). CLI flags
override; loopback-only
--insecure.test-oci-*), compat-shellsmoke (
tests/test-oci-compat.sh), and an opt-in heavy mode(
OCI_COMPAT_TEST=1) that drives three layered fixtures(alpine-shaped, busybox-shaped hardlink dispatch, two-layer
whiteout) end-to-end through a freshly-provisioned scratch
sparsebundle.
Manual smoke test (docker.io/library/python:3.12)
A real end-to-end pull-and-run against a mainstream multi-layer glibc
image. The image's default Entrypoint is
docker-entrypoint.sh(ashell script, which elfuse does not execute), so the commands below
override
--entrypointto the python3 binary directly.Performance characterization (vs OrbStack)
Measured on Apple M4 / macOS 15.4.1 (Darwin 24.4.0). OrbStack 2.1.3
acts as the ground-truth aarch64-linux runtime: it executes the same
docker.io/library/python:3.12image inside a Virtualization.framework-backed Linux VM with a real Linux kernel, so the comparison isolates
the cost of elfuse's user-mode ABI emulation against a native syscall
surface.
Pure CPU (factorial big-int multiply, no syscall)
Each engine ran twice; the second is warm.
computeis thetime.perf_counter()delta inside Python (pure interpreter +big-int multiply work);
realis the outer wall (includes enginestartup);
startup ≈ real - compute.Both engines emit
digit_sum=4154076 digits=973351— correctnessparity confirmed. Pure compute ratio: 1.01× (within measurement
noise). HVF runs guest aarch64 instructions directly so big-int
multiply + Python bytecode dispatch pay zero translation overhead.
Startup ratio: 15.0× (constant ~2.5 s for elfuse vs ~0.17 s for
orbstack), independent of N — verified separately at N=50000 where
both compute drops to ~0.14 s but elfuse startup stays at 2.53 s.
Syscall density (Python loop hammering syscalls)
syscall_overheadstrips the Python loop interpreter cost (measuredfrom the
baselineband) so the residual is the pure trap+returncost of a single syscall.
getppidis the cleanest measurement: no kernel work, just trap +return. elfuse pays roughly 1 μs per syscall versus ~0.1 μs native.
Rough HVF round-trip breakdown: vCPU state sync ~200 ns, Linux→macOS
semantics ~100 ns, the macOS syscall itself ~100 ns, errno + sync
back ~100 ns, HVF re-entry + ERET ~500 ns. This 1 μs floor is the
structural ceiling for any elfuse syscall path.
vDSO observation —
time.monotonic_nsshould hit the syntheticvDSO under
src/core/vdso.{c,h}and skip the trap (orbstack does, at0.018 μs), but the measured 1.006 μs matches the trapping baseline.
elfuse's vDSO entry is not being picked up by glibc 2.41 in this
image. This is an existing optimization opportunity unrelated to the
scope of this PR; left untouched here so the patch series stays
focused on image-distribution and runtime correctness.
Wall-clock model
For a pure-CPU workload of compute time W:
elfuse is competitive for long-running workloads (where the constant
startup amortizes out) and a known tradeoff for short CLI one-shots
where startup dominates total wall.
Known limitations
fork()followed byexecve()of a dynamically-linked ELF crashesin the child during dynamic-linker bring-up. This blocks Python's
subprocess.run([...other_dynamic_binary...]), shell pipelines thatspawn external binaries, and
timeout(1). Single-process Pythonworkloads, stdlib computation, and file I/O are unaffected.
linux/arm64. There isno
--platformflag; cross-arch image support is out of scope forthis PR.
pullprogress uses CSI cursor-up + clear-line for in-placeredraw. Terminal panes that ignore those escapes show stacking
rows; set
ELFUSE_OCI_PROGRESS=plainto disable the redraw andemit one summary line per blob instead.
Summary by cubic
Adds full OCI image lifecycle to elfuse: pull, inspect, unpack, clone, run, prune, rebuild-cache, and status. Improves pulls with parallel/resumable downloads, adds a content‑addressable store with GC and caches, and wires the runtime to execute images directly.
New Features
oci pull|inspect|unpack|clone|run|prune|rebuild-cache|status; pull adds progress and--refresh.libcurl; bearer-token and Basic auth; custom CA; loopback‑gated--insecure; writesoci-layoutand pins inindex.json.statussupports text/--json.zstddecode, whiteout‑aware apply, case‑sensitive APFS sysroot, per‑run rootfs viaclonefile(2).Username/group lookup; inject/etc/{resolv.conf,hosts,hostname}; emulate/dev/{full,console}; add/proccgroup/hostname/comm/statm; shared VM launcher.policy.jsonplusregistries.doverlay; merged with CLI flags.prunewith--older-than/--keep-bytes;rebuild-cache;statusreports blobs/layers/stacks../; cross‑volume unpack viacopyfile(2)with clone fallback.Migration
index.json; store auto‑migrates fromrefs/on open.zstdandcJSON; uses system zlib andlibcurl.Written for commit 5d6dbc7. Summary will update on new commits. Review in cubic