feat: integrate zstd:chunked and chunkah for content-based layers#139
feat: integrate zstd:chunked and chunkah for content-based layers#139castrojo wants to merge 2 commits intoublue-os:mainfrom
Conversation
- Add Justfile for local build orchestration (just build-boxkit, just clean) - Create scripts/chunkah-tag.sh with multi-distro support (apk, rpm, dpkg, pacman) - Rename ContainerFiles/ → Containerfiles/ for standardization - Update Containerfiles to use chunkah multi-stage builds - Local builds target 'builder' stage (fast, no chunkah) - CI builds complete full pipeline with content-based layering - Update GitHub Actions workflow: - Add SOURCE_DATE_EPOCH for reproducible builds - Add --skip-unused-stages=false for chunkah pattern - Add --compression-format=zstd:chunked to push step - Update documentation (README.md) with: - zstd:chunked benefits and how it works - Local vs CI build differences - New Justfile commands Benefits: - 60-80% bandwidth reduction on image updates - Intelligent package-based layer splitting (up to 64 layers) - Works with any OCI-compliant registry - Simple local development workflow Tested: Local podman build successful, container runs with all tools installed
There was a problem hiding this comment.
Pull request overview
Integrates chunkah-driven content-based layer splitting and zstd:chunked compression into the image build pipeline, while restructuring Containerfiles and adding local developer build orchestration.
Changes:
- Added
chunkahtagging script (scripts/chunkah-tag.sh) and a localJustfilefor fast, non-chunkah local builds. - Replaced legacy
ContainerFiles/*with multi-stageContainerfiles/*that can rechunk in CI. - Updated GitHub Actions workflow to pass
SOURCE_DATE_EPOCHand push withzstd:chunkedcompression; updated README documentation accordingly.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/chunkah-tag.sh |
New cross-distro file xattr tagging to enable chunkah’s content-based layering. |
README.md |
Documents new Containerfiles/ layout, Justfile usage, and chunkah/zstd:chunked behavior. |
Justfile |
Adds local build/run/cleanup commands that target the fast builder stage. |
Containerfiles/boxkit |
New multi-stage build: builder stage + chunkah rechunk stage producing an OCI archive. |
Containerfiles/fedora-example |
Same pattern as boxkit, with Fedora toolbox base and example metadata. |
ContainerFiles/boxkit |
Removed legacy single-stage ContainerFile. |
ContainerFiles/fedora-example |
Removed legacy single-stage ContainerFile. |
.github/workflows/build-boxkit.yml |
Adds reproducibility arg and switches to pushing with zstd:chunked compression; points to Containerfiles/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| apk info -q | while read pkgname; do | ||
| tag_files "$pkgname" $(apk info -L "$pkgname" 2>/dev/null | sed 's|^|/|') | ||
| done |
There was a problem hiding this comment.
Unquoted command substitution here causes word-splitting/globbing and can break on paths containing whitespace; it can also hit ARG_MAX for packages with many files. Prefer reading the file list line-by-line (and use read -r) rather than expanding the whole list into arguments.
| rpm -qa --qf '%{NAME}\n' | while read pkgname; do | ||
| tag_files "$pkgname" $(rpm -ql "$pkgname" 2>/dev/null) | ||
| done |
There was a problem hiding this comment.
Same issue as above: $(rpm -ql ...) is expanded unquoted into arguments, which is unsafe for whitespace and can exceed ARG_MAX for large packages. Prefer streaming the file list line-by-line (and use read -r).
| dpkg-query -W -f='${Package}\n' | while read pkgname; do | ||
| tag_files "$pkgname" $(dpkg -L "$pkgname" 2>/dev/null) | ||
| done |
There was a problem hiding this comment.
Same issue as above: $(dpkg -L ...) is expanded unquoted into arguments, which is unsafe for whitespace and can exceed ARG_MAX for packages with many files. Prefer streaming the file list line-by-line (and use read -r).
| pacman -Qq | while read pkgname; do | ||
| tag_files "$pkgname" $(pacman -Ql "$pkgname" 2>/dev/null | awk '{print $2}') |
There was a problem hiding this comment.
Same issue as above: $(pacman -Ql ...) output is expanded unquoted into arguments, which is unsafe for whitespace and can exceed ARG_MAX. Prefer streaming the file list line-by-line (and use read -r).
| pacman -Qq | while read pkgname; do | |
| tag_files "$pkgname" $(pacman -Ql "$pkgname" 2>/dev/null | awk '{print $2}') | |
| pacman -Qq | while read -r pkgname; do | |
| pacman -Qlq "$pkgname" 2>/dev/null | while read -r filepath; do | |
| tag_files "$pkgname" "$filepath" | |
| done |
.github/workflows/build-boxkit.yml
Outdated
| # Compute SOURCE_DATE_EPOCH for reproducible builds | ||
| - name: Compute SOURCE_DATE_EPOCH | ||
| id: epoch | ||
| run: echo "epoch=$(date -d '${{ github.event.head_commit.timestamp }}' +%s)" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
github.event.head_commit.timestamp is not present for pull_request, schedule, or workflow_dispatch events, so this step will fail (and block PR builds). Compute SOURCE_DATE_EPOCH from the checked-out commit instead (e.g., via git show -s --format=%ct $GITHUB_SHA or similar), with a safe fallback when the timestamp cannot be determined.
| run: echo "epoch=$(date -d '${{ github.event.head_commit.timestamp }}' +%s)" >> $GITHUB_OUTPUT | |
| run: | | |
| commit_ts=$(git show -s --format=%ct "$GITHUB_SHA" 2>/dev/null || echo '') | |
| if [ -z "$commit_ts" ]; then | |
| echo "Warning: could not determine commit timestamp, falling back to current time" | |
| commit_ts=$(date +%s) | |
| fi | |
| echo "epoch=$commit_ts" >> "$GITHUB_OUTPUT" |
| # Tag files with package component names for chunkah content-based layering | ||
| # Supports: apk (Alpine), rpm (Fedora/RHEL), dpkg (Debian/Ubuntu), pacman (Arch) | ||
|
|
||
| set -euo pipefail |
There was a problem hiding this comment.
This script declares #!/bin/sh but uses set -euo pipefail; pipefail is not supported by all /bin/sh implementations (notably dash on Debian/Ubuntu, which you claim to support), causing the script to exit immediately. Either switch to a shell that supports pipefail (and ensure it exists in all target images) or make pipefail optional/conditional.
| set -euo pipefail | |
| set -eu | |
| # Enable pipefail if supported by the current /bin/sh implementation | |
| if (set -o pipefail) 2>/dev/null; then | |
| set -o pipefail | |
| fi |
| # For local podman builds, use: podman build --target=builder -f Containerfiles/fedora-example -t fedora-example:local . | ||
|
|
||
| # https://github.com/coreos/chunkah | ||
| FROM quay.io/jlebon/chunkah AS chunkah |
There was a problem hiding this comment.
This stage pulls the quay.io/jlebon/chunkah image by a mutable tag, introducing a third-party build-time dependency that has full control over chunkah build and the generated out.ociarchive image. If this image or registry namespace is ever compromised, an attacker could execute arbitrary code during CI builds and ship backdoored artifacts without changes to this repo. To mitigate this supply-chain risk, pin the image to a specific immutable digest (and/or an officially maintained vendor namespace) and update it explicitly as needed instead of relying on a mutable tag.
| # For local podman builds, use: podman build --target=builder -f Containerfiles/boxkit -t boxkit:local . | ||
|
|
||
| # https://github.com/coreos/chunkah | ||
| FROM quay.io/jlebon/chunkah AS chunkah |
There was a problem hiding this comment.
This stage pulls the quay.io/jlebon/chunkah image by a mutable tag, introducing a third-party build-time dependency that has full control over chunkah build and the resulting out.ociarchive image. If this image or its registry namespace is compromised, an attacker could run arbitrary code in your CI build and produce malicious boxkit images without modifying this repository. To reduce supply-chain risk, pin this dependency to a specific image digest (or vetted vendor image) instead of a mutable tag and update the digest deliberately.
…tamp Resolves Copilot review feedback - github.event.head_commit.timestamp is not available for pull_request, schedule, or workflow_dispatch events. Now uses git show to get timestamp from checked-out commit with fallback.
As per IRL discussion with Kyle:
This PR integrates chunkah for content-based layer splitting and
zstd:chunkedcompression, following the pattern established in ublue-os/toolboxes PR #582.