Skip to content

feat: integrate zstd:chunked and chunkah for content-based layers#139

Open
castrojo wants to merge 2 commits intoublue-os:mainfrom
castrojo:main
Open

feat: integrate zstd:chunked and chunkah for content-based layers#139
castrojo wants to merge 2 commits intoublue-os:mainfrom
castrojo:main

Conversation

@castrojo
Copy link
Contributor

@castrojo castrojo commented Mar 7, 2026

As per IRL discussion with Kyle:

This PR integrates chunkah for content-based layer splitting and zstd:chunked compression, following the pattern established in ublue-os/toolboxes PR #582.

- Add Justfile for local build orchestration (just build-boxkit, just clean)
- Create scripts/chunkah-tag.sh with multi-distro support (apk, rpm, dpkg, pacman)
- Rename ContainerFiles/ → Containerfiles/ for standardization
- Update Containerfiles to use chunkah multi-stage builds
  - Local builds target 'builder' stage (fast, no chunkah)
  - CI builds complete full pipeline with content-based layering
- Update GitHub Actions workflow:
  - Add SOURCE_DATE_EPOCH for reproducible builds
  - Add --skip-unused-stages=false for chunkah pattern
  - Add --compression-format=zstd:chunked to push step
- Update documentation (README.md) with:
  - zstd:chunked benefits and how it works
  - Local vs CI build differences
  - New Justfile commands

Benefits:
- 60-80% bandwidth reduction on image updates
- Intelligent package-based layer splitting (up to 64 layers)
- Works with any OCI-compliant registry
- Simple local development workflow

Tested: Local podman build successful, container runs with all tools installed
Copilot AI review requested due to automatic review settings March 7, 2026 15:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Integrates chunkah-driven content-based layer splitting and zstd:chunked compression into the image build pipeline, while restructuring Containerfiles and adding local developer build orchestration.

Changes:

  • Added chunkah tagging script (scripts/chunkah-tag.sh) and a local Justfile for fast, non-chunkah local builds.
  • Replaced legacy ContainerFiles/* with multi-stage Containerfiles/* that can rechunk in CI.
  • Updated GitHub Actions workflow to pass SOURCE_DATE_EPOCH and push with zstd:chunked compression; updated README documentation accordingly.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/chunkah-tag.sh New cross-distro file xattr tagging to enable chunkah’s content-based layering.
README.md Documents new Containerfiles/ layout, Justfile usage, and chunkah/zstd:chunked behavior.
Justfile Adds local build/run/cleanup commands that target the fast builder stage.
Containerfiles/boxkit New multi-stage build: builder stage + chunkah rechunk stage producing an OCI archive.
Containerfiles/fedora-example Same pattern as boxkit, with Fedora toolbox base and example metadata.
ContainerFiles/boxkit Removed legacy single-stage ContainerFile.
ContainerFiles/fedora-example Removed legacy single-stage ContainerFile.
.github/workflows/build-boxkit.yml Adds reproducibility arg and switches to pushing with zstd:chunked compression; points to Containerfiles/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +20 to +22
apk info -q | while read pkgname; do
tag_files "$pkgname" $(apk info -L "$pkgname" 2>/dev/null | sed 's|^|/|')
done
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unquoted command substitution here causes word-splitting/globbing and can break on paths containing whitespace; it can also hit ARG_MAX for packages with many files. Prefer reading the file list line-by-line (and use read -r) rather than expanding the whole list into arguments.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +38
rpm -qa --qf '%{NAME}\n' | while read pkgname; do
tag_files "$pkgname" $(rpm -ql "$pkgname" 2>/dev/null)
done
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: $(rpm -ql ...) is expanded unquoted into arguments, which is unsafe for whitespace and can exceed ARG_MAX for large packages. Prefer streaming the file list line-by-line (and use read -r).

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +54
dpkg-query -W -f='${Package}\n' | while read pkgname; do
tag_files "$pkgname" $(dpkg -L "$pkgname" 2>/dev/null)
done
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: $(dpkg -L ...) is expanded unquoted into arguments, which is unsafe for whitespace and can exceed ARG_MAX for packages with many files. Prefer streaming the file list line-by-line (and use read -r).

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +62
pacman -Qq | while read pkgname; do
tag_files "$pkgname" $(pacman -Ql "$pkgname" 2>/dev/null | awk '{print $2}')
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: $(pacman -Ql ...) output is expanded unquoted into arguments, which is unsafe for whitespace and can exceed ARG_MAX. Prefer streaming the file list line-by-line (and use read -r).

Suggested change
pacman -Qq | while read pkgname; do
tag_files "$pkgname" $(pacman -Ql "$pkgname" 2>/dev/null | awk '{print $2}')
pacman -Qq | while read -r pkgname; do
pacman -Qlq "$pkgname" 2>/dev/null | while read -r filepath; do
tag_files "$pkgname" "$filepath"
done

Copilot uses AI. Check for mistakes.
# Compute SOURCE_DATE_EPOCH for reproducible builds
- name: Compute SOURCE_DATE_EPOCH
id: epoch
run: echo "epoch=$(date -d '${{ github.event.head_commit.timestamp }}' +%s)" >> $GITHUB_OUTPUT
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.event.head_commit.timestamp is not present for pull_request, schedule, or workflow_dispatch events, so this step will fail (and block PR builds). Compute SOURCE_DATE_EPOCH from the checked-out commit instead (e.g., via git show -s --format=%ct $GITHUB_SHA or similar), with a safe fallback when the timestamp cannot be determined.

Suggested change
run: echo "epoch=$(date -d '${{ github.event.head_commit.timestamp }}' +%s)" >> $GITHUB_OUTPUT
run: |
commit_ts=$(git show -s --format=%ct "$GITHUB_SHA" 2>/dev/null || echo '')
if [ -z "$commit_ts" ]; then
echo "Warning: could not determine commit timestamp, falling back to current time"
commit_ts=$(date +%s)
fi
echo "epoch=$commit_ts" >> "$GITHUB_OUTPUT"

Copilot uses AI. Check for mistakes.
# Tag files with package component names for chunkah content-based layering
# Supports: apk (Alpine), rpm (Fedora/RHEL), dpkg (Debian/Ubuntu), pacman (Arch)

set -euo pipefail
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script declares #!/bin/sh but uses set -euo pipefail; pipefail is not supported by all /bin/sh implementations (notably dash on Debian/Ubuntu, which you claim to support), causing the script to exit immediately. Either switch to a shell that supports pipefail (and ensure it exists in all target images) or make pipefail optional/conditional.

Suggested change
set -euo pipefail
set -eu
# Enable pipefail if supported by the current /bin/sh implementation
if (set -o pipefail) 2>/dev/null; then
set -o pipefail
fi

Copilot uses AI. Check for mistakes.
# For local podman builds, use: podman build --target=builder -f Containerfiles/fedora-example -t fedora-example:local .

# https://github.com/coreos/chunkah
FROM quay.io/jlebon/chunkah AS chunkah
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stage pulls the quay.io/jlebon/chunkah image by a mutable tag, introducing a third-party build-time dependency that has full control over chunkah build and the generated out.ociarchive image. If this image or registry namespace is ever compromised, an attacker could execute arbitrary code during CI builds and ship backdoored artifacts without changes to this repo. To mitigate this supply-chain risk, pin the image to a specific immutable digest (and/or an officially maintained vendor namespace) and update it explicitly as needed instead of relying on a mutable tag.

Copilot uses AI. Check for mistakes.
# For local podman builds, use: podman build --target=builder -f Containerfiles/boxkit -t boxkit:local .

# https://github.com/coreos/chunkah
FROM quay.io/jlebon/chunkah AS chunkah
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stage pulls the quay.io/jlebon/chunkah image by a mutable tag, introducing a third-party build-time dependency that has full control over chunkah build and the resulting out.ociarchive image. If this image or its registry namespace is compromised, an attacker could run arbitrary code in your CI build and produce malicious boxkit images without modifying this repository. To reduce supply-chain risk, pin this dependency to a specific image digest (or vetted vendor image) instead of a mutable tag and update the digest deliberately.

Copilot uses AI. Check for mistakes.
…tamp

Resolves Copilot review feedback - github.event.head_commit.timestamp
is not available for pull_request, schedule, or workflow_dispatch events.
Now uses git show to get timestamp from checked-out commit with fallback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants