Skip to content

Feat/base image zstd compression#2868

Open
jiridanek wants to merge 16 commits intoopendatahub-io:mainfrom
jiridanek:feat/base-image-zstd-compression
Open

Feat/base image zstd compression#2868
jiridanek wants to merge 16 commits intoopendatahub-io:mainfrom
jiridanek:feat/base-image-zstd-compression

Conversation

@jiridanek
Copy link
Copy Markdown
Member

@jiridanek jiridanek commented Jan 26, 2026

https://issues.redhat.com/browse/STONEBLD-4276

Description

Our CUDA and ROCm base images are large (8-15GB), resulting in slow container startup times.
When deploying workbenches on Kubernetes/OpenShift, the image pull time significantly impacts
user experience, especially for the first deployment on a node.

The zstd:chunked compression format, developed by Red Hat engineers Giuseppe Scrivano and
Dan Walsh, enables "partial pulls" where container runtimes can fetch only the layers and
files actually needed, rather than downloading the entire image.

Key benefits of zstd:chunked (source):

  • Lazy pulling: Individual file chunks can be fetched via HTTP range requests
  • Deduplication: Files already present locally are relinked, not re-downloaded
  • Better compression: zstd compresses better and faster than gzip

How Has This Been Tested?

Here's a concise PR comment summarizing the benchmark results:


Benchmark Results: zstd:chunked vs gzip

Tested across 3 GitHub Actions runs with varying network conditions.

Image Sizes

Image gzip zstd:chunked Savings
CPU C9S 534 MB 525 MB 2% smaller
CUDA C9S 6.9 GB 5.0 GB 28% smaller

Pull/Build Performance (averaged)

Metric zstd:chunked gzip Improvement
CPU cold pull 29.2s 32.8s 11% faster
CUDA cold pull 80.9s 173.9s 53% faster
CUDA warm pull 221.5s 389.5s 43% faster
Build FROM CPU 11.0s 20.4s 46% faster
Build FROM CUDA 62.0s 174.4s 64% faster

Key Findings

  • CUDA images benefit most: 53-64% faster pulls/builds, 1.9 GB size reduction
  • CPU image benefits vary by network: 1-23% faster depending on conditions
  • Build times significantly improved: 46-64% faster due to faster base image pulls

Detailed Results

  • Run #3 - Best network conditions (23% CPU improvement)
  • Run #2 - Moderate conditions (15% CPU improvement)
  • Run # 1 - Congested conditions (1% CPU improvement, CUDA base-image build has not yet finished)

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features
    • Added multi-architecture base-image build pipelines with platform-matrix builds, automatic compression conversion (manifest-list vs single-image), SBOM generation, publishable image results, and failure notifications.
  • Chores
    • Updated pipeline run configurations to use the new pipelines, introduced platform parameters where needed, and increased compute resources for compression and scan steps to improve reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 26, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

Adds two new Tekton Pipelines for multi-arch base-image build/push flows and updates ~22 PipelineRun manifests to reference them; several PipelineRuns also add build-platforms params or taskRunSpecs to increase compute for convert-compression.

Changes

Cohort / File(s) Summary
New Pipeline Definitions
​.tekton/base-image-multiarch-pull-request-pipeline.yaml, ​.tekton/base-image-multiarch-push-pipeline.yaml
Introduces two public Tekton Pipeline resources implementing multi-arch build workflows (matrix builds → index → compression conversion → downstream scans/tags/push), with many params, results, workspaces and a finally block.
PipelineRun: PR ref updates
​.tekton/odh-base-image-cpu-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-cpu-py312-ubi9-pull-request.yaml, ​.tekton/odh-base-image-cuda-12-8-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-cuda-12-8-py312-ubi9-pull-request.yaml, ​.tekton/odh-base-image-cuda-13-0-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-cuda-py311-c9s-pull-request.yaml, ​.tekton/odh-base-image-cuda-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml
Repoints spec.pipelineRef.name from multiarch-pull-request-pipeline to base-image-multiarch-pull-request-pipeline.
PipelineRun: PR ref + compute overrides
​.tekton/odh-base-image-rocm-6-3-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-rocm-6-3-py312-ubi9-pull-request.yaml, ​.tekton/odh-base-image-rocm-6-4-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-rocm-6-4-py312-ubi9-pull-request.yaml, ​.tekton/odh-base-image-rocm-py312-c9s-pull-request.yaml, ​.tekton/odh-base-image-rocm-py312-ubi9-pull-request.yaml
Switches to base-image-multiarch-pull-request-pipeline and inserts spec.taskRunSpecs entries to raise CPU/memory for convert-compression (and related scan task overrides).
PipelineRun: push ref updates
​.tekton/odh-base-image-cpu-py312-c9s-push.yaml, ​.tekton/odh-base-image-cpu-py312-ubi9-push.yaml, ​.tekton/odh-base-image-cuda-12-8-py312-c9s-push.yaml, ​.tekton/odh-base-image-cuda-12-8-py312-ubi9-push.yaml, ​.tekton/odh-base-image-cuda-13-0-py312-c9s-push.yaml, ​.tekton/odh-base-image-cuda-py311-c9s-push.yaml, ​.tekton/odh-base-image-cuda-py312-c9s-push.yaml, ​.tekton/odh-base-image-cuda-py312-ubi9-push.yaml
Repoints spec.pipelineRef.name from multiarch-push-pipeline to base-image-multiarch-push-pipeline.
PipelineRun: push ref + build-platforms param (singlearch→multiarch)
​.tekton/odh-base-image-rocm-6-3-py312-c9s-push.yaml, ​.tekton/odh-base-image-rocm-6-3-py312-ubi9-push.yaml, ​.tekton/odh-base-image-rocm-6-4-py312-c9s-push.yaml, ​.tekton/odh-base-image-rocm-6-4-py312-ubi9-push.yaml, ​.tekton/odh-base-image-rocm-py312-c9s-push.yaml, ​.tekton/odh-base-image-rocm-py312-ubi9-push.yaml
Changes pipelineRef from singlearch-push-pipeline to base-image-multiarch-push-pipeline and adds build-platforms param (e.g., linux-mxlarge/amd64).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • daniellutz
  • atheo89
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feat/base image zstd compression' accurately summarizes the main change: adding zstd:chunked compression support for base images. It is concise, clear, and specific about the primary feature being introduced.
Description check ✅ Passed The PR description is comprehensive with a clear problem statement, explanation of the solution (zstd:chunked compression), detailed benchmark results, and references to related issues and documentation. However, the self-checklist items and merge criteria checkboxes remain unchecked, indicating incomplete verification by the author.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jiridanek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions Bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Jan 26, 2026
@openshift-ci openshift-ci Bot added size/xxl and removed size/xxl labels Jan 26, 2026
@jiridanek jiridanek force-pushed the feat/base-image-zstd-compression branch from 1f8a3e5 to b654e5d Compare January 28, 2026 17:38
@openshift-ci openshift-ci Bot added size/xxl and removed size/xxl labels Jan 28, 2026
Add new pipelines for base images that enable zstd:chunked compression
for faster image pulls. This is especially important for CUDA images
which can be 10-15GB+.

New pipelines:
- base-image-multiarch-push-pipeline.yaml
- base-image-multiarch-pull-request-pipeline.yaml

These pipelines add a convert-compression task that runs after
build-image-index to re-push images with zstd:chunked compression
and --force-compression flag (required for CUDA base layers).

Updated one example PipelineRun to reference the new pipeline:
- odh-base-image-cuda-py312-ubi9-push.yaml
- odh-base-image-cuda-py312-ubi9-pull-request.yaml

Related: konflux-ci/build-definitions#3188
…pipeline

Update all base-image PipelineRuns to reference the new pipelines with
zstd:chunked compression support:

- Push pipelines: base-image-multiarch-push-pipeline
- Pull-request pipelines: base-image-multiarch-pull-request-pipeline

For ROCm images that were using singlearch-push-pipeline, switched to
multiarch pipeline with explicit linux-mxlarge/amd64 platform.

This enables faster image pulls for all base images including:
- CPU base images (c9s and ubi9)
- CUDA base images (11, 12.6, 12.8, 13.0)
- ROCm base images (6.2, 6.3, 6.4)
Updated the base-image pipelines to convert architecture-specific images to zstd:chunked compression **before** creating the manifest index. Refactored the pipeline flow to `build-images → convert-compression (per arch) → build-image-index`. Adjusted dependent tasks to consume updated compression results for better consistency and optimization.
…ask names

- Move convert-compression task before build-image-index so each arch
  image is compressed before the manifest index is created
- Change from matrix task to single task processing array to avoid
  extremely long TaskRun names that break the Konflux UI
- Add unified multiarch-pipeline.yaml with optional compression support
- Compression is now processed sequentially but with cleaner task names
Use jq -cn with $ARGS.positional to properly format the JSON array
for Tekton typed array results. The previous approach with printf
and jq -R was causing malformed array values.
Remove the extra array wrapping (- prefix) when passing array results.
The syntax `value: $(tasks.X.results.ARRAY[*])` expands directly into
the array parameter, while `value: [- $(tasks.X.results.ARRAY[*])]`
incorrectly wraps the expansion in another array level.
The Tekton typed array result expansion doesn't work correctly when
passing to build-image-index IMAGES parameter. The array result from
convert-compression task was only passing the last element.

Revert to the simpler approach where compression happens AFTER
build-image-index. When buildah manifest push is called with
--compression-format --force-compression --all, it re-pushes all
architecture images with the new compression.
@openshift-ci openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026
jiridanek added a commit that referenced this pull request Jan 31, 2026
This test branch uses the zstd:chunked compressed base images built in
PR #2868 to measure whether GHA builds are faster with the new compression
format.

Base images updated to digest-pinned zstd:chunked versions:
- CPU C9S: sha256:2580cb333...
- CUDA C9S (13.0): sha256:2a4cb3a49...
- CUDA UBI9 (12.8): sha256:7032289e4...
- ROCm C9S (6.4): sha256:f76c06cbc...
- ROCm UBI9 (6.4): sha256:13f2bc5bd...

Expected improvements based on benchmarks:
- CUDA image pulls: ~53% faster
- CUDA builds: ~64% faster
- CPU image pulls: ~18% faster

Co-authored-by: Cursor <cursoragent@cursor.com>
- Added local manifest list creation and platform addition for multiarch images.
- Updated `buildah manifest inspect` to remove `docker://` transport.
- Enhanced cleanup process after manifest push for robustness.
@openshift-ci openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026
@jiridanek
Copy link
Copy Markdown
Member Author

/kf-build base-images/cpu/c9s-python-3.12

@jiridanek
Copy link
Copy Markdown
Member Author

/kfbuild base-images/cpu/c9s-python-3.12

- Strip tags from IMAGE_URL to avoid "tag@digest" issues in buildah manifest inspect.
- Prevents incorrect resolution of tag over digest due to single-arch manifests.
@openshift-ci openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.tekton/base-image-multiarch-pull-request-pipeline.yaml:
- Around line 339-344: The current removal of tag via
IMAGE_BASE="${IMAGE_URL%:*}" can strip registry ports; change to first isolate
the last path segment and only strip a tag if that segment contains a colon:
derive NAME="${IMAGE_URL##*/}", if NAME contains ":" set
IMAGE_BASE="${IMAGE_URL%:*}" else set IMAGE_BASE="${IMAGE_URL}" and then
construct IMAGE_REF="${IMAGE_BASE}@${IMAGE_DIGEST}"; update uses of
IMAGE_URL/IMAGE_BASE/IMAGE_REF accordingly so ports are preserved.
🧹 Nitpick comments (5)
.tekton/base-image-multiarch-push-pipeline.yaml (4)

133-145: Consider adding meaningful descriptions to pipeline results.

All result descriptions are empty strings. Adding brief descriptions (e.g., "Final compressed image URL", "Final compressed image digest") would improve pipeline documentation and usability.


159-159: Consider pinning the Alpine image to a specific digest.

Using latest tag can lead to non-reproducible builds if the image changes upstream. For consistency with other tasks in this pipeline that use pinned digests, consider using a specific version or digest.


464-469: Consider adding CPU limit for resource consistency.

Memory limits are specified but CPU limit is missing. For large image compression operations, having explicit CPU limits helps with cluster resource planning and prevents potential resource contention.

🔧 Proposed fix
         resources:
           requests:
             memory: 4Gi
             cpu: 500m
           limits:
             memory: 8Gi
+            cpu: "2"

447-454: Add retry logic for large image push/pull operations.

The buildah pull and buildah push operations lack retry logic. For large CUDA/ROCm images (multi-GB), transient network failures could cause pipeline failures. Consider implementing a simple retry wrapper around these operations to improve reliability.

.tekton/base-image-multiarch-pull-request-pipeline.yaml (1)

111-123: Add descriptions to pipeline results for clarity.

The result descriptions are empty. Adding meaningful descriptions improves pipeline documentation and helps users understand what each result provides.

📝 Suggested descriptions
   results:
-  - description: ""
+  - description: Fully qualified image URL after compression conversion
     name: IMAGE_URL
     value: $(tasks.convert-compression.results.IMAGE_URL)
-  - description: ""
+  - description: Image digest after compression conversion
     name: IMAGE_DIGEST
     value: $(tasks.convert-compression.results.IMAGE_DIGEST)
-  - description: ""
+  - description: Source repository URL for Tekton Chains provenance
     name: CHAINS-GIT_URL
     value: $(tasks.clone-repository.results.url)
-  - description: ""
+  - description: Git commit SHA for Tekton Chains provenance
     name: CHAINS-GIT_COMMIT
     value: $(tasks.clone-repository.results.commit)

Comment thread .tekton/base-image-multiarch-pull-request-pipeline.yaml
- Added logic to distinguish registry ports from tags in IMAGE_URL.
- Ensures correct formatting for buildah manifest inspect to avoid "tag@digest" compatibility issues.
…port

- Added user namespace setup for buildah tasks to align with Konflux build environment.
- Integrated manifest conversion and single-image push scripts with unshare for consistency.
- Introduced permission checks and debug logging for converted images.
@openshift-ci openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 1, 2026

@jiridanek: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images ac6d07f link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 24, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/xxl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant