Feat/base image zstd compression by jiridanek · Pull Request #2868 · opendatahub-io/notebooks

jiridanek · 2026-01-26T17:41:36Z

https://issues.redhat.com/browse/STONEBLD-4276

Description

Our CUDA and ROCm base images are large (8-15GB), resulting in slow container startup times.
When deploying workbenches on Kubernetes/OpenShift, the image pull time significantly impacts
user experience, especially for the first deployment on a node.

The zstd:chunked compression format, developed by Red Hat engineers Giuseppe Scrivano and
Dan Walsh, enables "partial pulls" where container runtimes can fetch only the layers and
files actually needed, rather than downloading the entire image.

Key benefits of zstd:chunked (source):

Lazy pulling: Individual file chunks can be fetched via HTTP range requests
Deduplication: Files already present locally are relinked, not re-downloaded
Better compression: zstd compresses better and faster than gzip

How Has This Been Tested?

Here's a concise PR comment summarizing the benchmark results:

Benchmark Results: zstd:chunked vs gzip

Tested across 3 GitHub Actions runs with varying network conditions.

Image Sizes

Image	gzip	zstd:chunked	Savings
CPU C9S	534 MB	525 MB	2% smaller
CUDA C9S	6.9 GB	5.0 GB	28% smaller

Pull/Build Performance (averaged)

Metric	zstd:chunked	gzip	Improvement
CPU cold pull	29.2s	32.8s	11% faster
CUDA cold pull	80.9s	173.9s	53% faster
CUDA warm pull	221.5s	389.5s	43% faster
Build FROM CPU	11.0s	20.4s	46% faster
Build FROM CUDA	62.0s	174.4s	64% faster

Key Findings

CUDA images benefit most: 53-64% faster pulls/builds, 1.9 GB size reduction
CPU image benefits vary by network: 1-23% faster depending on conditions
Build times significantly improved: 46-64% faster due to faster base image pulls

Detailed Results

Run #3 - Best network conditions (23% CPU improvement)
Run #2 - Moderate conditions (15% CPU improvement)
Run # 1 - Congested conditions (1% CPU improvement, CUDA base-image build has not yet finished)

Self checklist (all need to be checked):

Ensure that you have run make test (gmake on macOS) before asking for review
Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Added multi-architecture base-image build pipelines with platform-matrix builds, automatic compression conversion (manifest-list vs single-image), SBOM generation, publishable image results, and failure notifications.
Chores
- Updated pipeline run configurations to use the new pipelines, introduced platform parameters where needed, and increased compute resources for compression and scan steps to improve reliability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

openshift-ci · 2026-01-26T17:41:40Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-01-26T17:41:43Z

📝 Walkthrough

Walkthrough

Adds two new Tekton Pipelines for multi-arch base-image build/push flows and updates ~22 PipelineRun manifests to reference them; several PipelineRuns also add build-platforms params or taskRunSpecs to increase compute for convert-compression.

Changes

Cohort / File(s)	Summary
New Pipeline Definitions `.tekton/base-image-multiarch-pull-request-pipeline.yaml`, `.tekton/base-image-multiarch-push-pipeline.yaml`	Introduces two public Tekton Pipeline resources implementing multi-arch build workflows (matrix builds → index → compression conversion → downstream scans/tags/push), with many params, results, workspaces and a finally block.
PipelineRun: PR ref updates `.tekton/odh-base-image-cpu-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-cpu-py312-ubi9-pull-request.yaml`, `.tekton/odh-base-image-cuda-12-8-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-cuda-12-8-py312-ubi9-pull-request.yaml`, `.tekton/odh-base-image-cuda-13-0-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-cuda-py311-c9s-pull-request.yaml`, `.tekton/odh-base-image-cuda-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml`	Repoints `spec.pipelineRef.name` from `multiarch-pull-request-pipeline` to `base-image-multiarch-pull-request-pipeline`.
PipelineRun: PR ref + compute overrides `.tekton/odh-base-image-rocm-6-3-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-rocm-6-3-py312-ubi9-pull-request.yaml`, `.tekton/odh-base-image-rocm-6-4-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-rocm-6-4-py312-ubi9-pull-request.yaml`, `.tekton/odh-base-image-rocm-py312-c9s-pull-request.yaml`, `.tekton/odh-base-image-rocm-py312-ubi9-pull-request.yaml`	Switches to `base-image-multiarch-pull-request-pipeline` and inserts `spec.taskRunSpecs` entries to raise CPU/memory for `convert-compression` (and related scan task overrides).
PipelineRun: push ref updates `.tekton/odh-base-image-cpu-py312-c9s-push.yaml`, `.tekton/odh-base-image-cpu-py312-ubi9-push.yaml`, `.tekton/odh-base-image-cuda-12-8-py312-c9s-push.yaml`, `.tekton/odh-base-image-cuda-12-8-py312-ubi9-push.yaml`, `.tekton/odh-base-image-cuda-13-0-py312-c9s-push.yaml`, `.tekton/odh-base-image-cuda-py311-c9s-push.yaml`, `.tekton/odh-base-image-cuda-py312-c9s-push.yaml`, `.tekton/odh-base-image-cuda-py312-ubi9-push.yaml`	Repoints `spec.pipelineRef.name` from `multiarch-push-pipeline` to `base-image-multiarch-push-pipeline`.
PipelineRun: push ref + build-platforms param (singlearch→multiarch) `.tekton/odh-base-image-rocm-6-3-py312-c9s-push.yaml`, `.tekton/odh-base-image-rocm-6-3-py312-ubi9-push.yaml`, `.tekton/odh-base-image-rocm-6-4-py312-c9s-push.yaml`, `.tekton/odh-base-image-rocm-6-4-py312-ubi9-push.yaml`, `.tekton/odh-base-image-rocm-py312-c9s-push.yaml`, `.tekton/odh-base-image-rocm-py312-ubi9-push.yaml`	Changes `pipelineRef` from `singlearch-push-pipeline` to `base-image-multiarch-push-pipeline` and adds `build-platforms` param (e.g., `linux-mxlarge/amd64`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

RHAIENG-2460: Update x86_64 to amd64 #2778 — touches Tekton pipeline YAMLs and build-platforms/pipelineRef configurations.
[stable] RHAIENG-2777: chore(tekton): update and add pipeline definitions for multi-arch and targeted builds #2842 — edits PipelineRun pipelineRef wiring for multi-arch builds; closely related to these pipeline ref changes.

Suggested reviewers

daniellutz
atheo89

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Feat/base image zstd compression' accurately summarizes the main change: adding zstd:chunked compression support for base images. It is concise, clear, and specific about the primary feature being introduced.
Description check	✅ Passed	The PR description is comprehensive with a clear problem statement, explanation of the solution (zstd:chunked compression), detailed benchmark results, and references to related issues and documentation. However, the self-checklist items and merge criteria checkboxes remain unchecked, indicating incomplete verification by the author.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-01-26T17:41:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jiridanek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add new pipelines for base images that enable zstd:chunked compression for faster image pulls. This is especially important for CUDA images which can be 10-15GB+. New pipelines: - base-image-multiarch-push-pipeline.yaml - base-image-multiarch-pull-request-pipeline.yaml These pipelines add a convert-compression task that runs after build-image-index to re-push images with zstd:chunked compression and --force-compression flag (required for CUDA base layers). Updated one example PipelineRun to reference the new pipeline: - odh-base-image-cuda-py312-ubi9-push.yaml - odh-base-image-cuda-py312-ubi9-pull-request.yaml Related: konflux-ci/build-definitions#3188

…pipeline Update all base-image PipelineRuns to reference the new pipelines with zstd:chunked compression support: - Push pipelines: base-image-multiarch-push-pipeline - Pull-request pipelines: base-image-multiarch-pull-request-pipeline For ROCm images that were using singlearch-push-pipeline, switched to multiarch pipeline with explicit linux-mxlarge/amd64 platform. This enables faster image pulls for all base images including: - CPU base images (c9s and ubi9) - CUDA base images (11, 12.6, 12.8, 13.0) - ROCm base images (6.2, 6.3, 6.4)

Updated the base-image pipelines to convert architecture-specific images to zstd:chunked compression **before** creating the manifest index. Refactored the pipeline flow to `build-images → convert-compression (per arch) → build-image-index`. Adjusted dependent tasks to consume updated compression results for better consistency and optimization.

…ask names - Move convert-compression task before build-image-index so each arch image is compressed before the manifest index is created - Change from matrix task to single task processing array to avoid extremely long TaskRun names that break the Konflux UI - Add unified multiarch-pipeline.yaml with optional compression support - Compression is now processed sequentially but with cleaner task names

Use jq -cn with $ARGS.positional to properly format the JSON array for Tekton typed array results. The previous approach with printf and jq -R was causing malformed array values.

Remove the extra array wrapping (- prefix) when passing array results. The syntax `value: $(tasks.X.results.ARRAY[*])` expands directly into the array parameter, while `value: [- $(tasks.X.results.ARRAY[*])]` incorrectly wraps the expansion in another array level.

The Tekton typed array result expansion doesn't work correctly when passing to build-image-index IMAGES parameter. The array result from convert-compression task was only passing the last element. Revert to the simpler approach where compression happens AFTER build-image-index. When buildah manifest push is called with --compression-format --force-compression --all, it re-pushes all architecture images with the new compression.

This test branch uses the zstd:chunked compressed base images built in PR #2868 to measure whether GHA builds are faster with the new compression format. Base images updated to digest-pinned zstd:chunked versions: - CPU C9S: sha256:2580cb333... - CUDA C9S (13.0): sha256:2a4cb3a49... - CUDA UBI9 (12.8): sha256:7032289e4... - ROCm C9S (6.4): sha256:f76c06cbc... - ROCm UBI9 (6.4): sha256:13f2bc5bd... Expected improvements based on benchmarks: - CUDA image pulls: ~53% faster - CUDA builds: ~64% faster - CPU image pulls: ~18% faster Co-authored-by: Cursor <cursoragent@cursor.com>

- Added local manifest list creation and platform addition for multiarch images. - Updated `buildah manifest inspect` to remove `docker://` transport. - Enhanced cleanup process after manifest push for robustness.

jiridanek · 2026-01-31T12:36:05Z

/kf-build base-images/cpu/c9s-python-3.12

jiridanek · 2026-01-31T12:53:25Z

/kfbuild base-images/cpu/c9s-python-3.12

- Strip tags from IMAGE_URL to avoid "tag@digest" issues in buildah manifest inspect. - Prevents incorrect resolution of tag over digest due to single-arch manifests.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.tekton/base-image-multiarch-pull-request-pipeline.yaml:
- Around line 339-344: The current removal of tag via
IMAGE_BASE="${IMAGE_URL%:*}" can strip registry ports; change to first isolate
the last path segment and only strip a tag if that segment contains a colon:
derive NAME="${IMAGE_URL##*/}", if NAME contains ":" set
IMAGE_BASE="${IMAGE_URL%:*}" else set IMAGE_BASE="${IMAGE_URL}" and then
construct IMAGE_REF="${IMAGE_BASE}@${IMAGE_DIGEST}"; update uses of
IMAGE_URL/IMAGE_BASE/IMAGE_REF accordingly so ports are preserved.

🧹 Nitpick comments (5)

.tekton/base-image-multiarch-push-pipeline.yaml (4)
133-145: Consider adding meaningful descriptions to pipeline results.

All result descriptions are empty strings. Adding brief descriptions (e.g., "Final compressed image URL", "Final compressed image digest") would improve pipeline documentation and usability.

159-159: Consider pinning the Alpine image to a specific digest.

Using latest tag can lead to non-reproducible builds if the image changes upstream. For consistency with other tasks in this pipeline that use pinned digests, consider using a specific version or digest.

464-469: Consider adding CPU limit for resource consistency.

Memory limits are specified but CPU limit is missing. For large image compression operations, having explicit CPU limits helps with cluster resource planning and prevents potential resource contention.
🔧 Proposed fix
         resources:
           requests:
             memory: 4Gi
             cpu: 500m
           limits:
             memory: 8Gi
+            cpu: "2"
447-454: Add retry logic for large image push/pull operations.

The buildah pull and buildah push operations lack retry logic. For large CUDA/ROCm images (multi-GB), transient network failures could cause pipeline failures. Consider implementing a simple retry wrapper around these operations to improve reliability.
.tekton/base-image-multiarch-pull-request-pipeline.yaml (1)
111-123: Add descriptions to pipeline results for clarity.

The result descriptions are empty. Adding meaningful descriptions improves pipeline documentation and helps users understand what each result provides.
📝 Suggested descriptions
   results:
-  - description: ""
+  - description: Fully qualified image URL after compression conversion
     name: IMAGE_URL
     value: $(tasks.convert-compression.results.IMAGE_URL)
-  - description: ""
+  - description: Image digest after compression conversion
     name: IMAGE_DIGEST
     value: $(tasks.convert-compression.results.IMAGE_DIGEST)
-  - description: ""
+  - description: Source repository URL for Tekton Chains provenance
     name: CHAINS-GIT_URL
     value: $(tasks.clone-repository.results.url)
-  - description: ""
+  - description: Git commit SHA for Tekton Chains provenance
     name: CHAINS-GIT_COMMIT
     value: $(tasks.clone-repository.results.commit)

- Added logic to distinguish registry ports from tags in IMAGE_URL. - Ensures correct formatting for buildah manifest inspect to avoid "tag@digest" compatibility issues.

…port - Added user namespace setup for buildah tasks to align with Konflux build environment. - Integrated manifest conversion and single-image push scripts with unshare for consistency. - Introduced permission checks and debug logging for converted images.

openshift-ci · 2026-02-01T04:17:57Z

@jiridanek: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/images	`ac6d07f`	link	true	`/test images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci · 2026-03-24T03:47:12Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci Bot added the do-not-merge/work-in-progress label Jan 26, 2026

openshift-ci Bot added the size/xxl label Jan 26, 2026

github-actions Bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Jan 26, 2026

openshift-ci Bot added size/xxl and removed size/xxl labels Jan 26, 2026

jiridanek force-pushed the feat/base-image-zstd-compression branch from 1f8a3e5 to b654e5d Compare January 28, 2026 17:38

openshift-ci Bot added size/xxl and removed size/xxl labels Jan 28, 2026

jiridanek added 7 commits January 31, 2026 07:02

fix(tekton): use correct jq format for array results

a0ba70a

Use jq -cn with $ARGS.positional to properly format the JSON array for Tekton typed array results. The previous approach with printf and jq -R was causing malformed array values.

openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026

jiridanek mentioned this pull request Jan 31, 2026

Test: Use zstd:chunked base images to measure GHA build speedup #2890

Draft

2 tasks

fix(tekton): fix manifest list handling for compression tasks

d198bc0

- Added local manifest list creation and platform addition for multiarch images. - Updated `buildah manifest inspect` to remove `docker://` transport. - Enhanced cleanup process after manifest push for robustness.

openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026

fix(tekton): fix IMAGE_URL formatting for buildah manifest compatibility

84cd0e2

- Strip tags from IMAGE_URL to avoid "tag@digest" issues in buildah manifest inspect. - Prevents incorrect resolution of tag over digest due to single-arch manifests.

openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026

coderabbitai Bot reviewed Jan 31, 2026

View reviewed changes

Comment thread .tekton/base-image-multiarch-pull-request-pipeline.yaml

jiridanek added 2 commits January 31, 2026 14:15

fix(tekton): improve IMAGE_URL parsing to handle ports vs tags

9c88ccc

- Added logic to distinguish registry ports from tags in IMAGE_URL. - Ensures correct formatting for buildah manifest inspect to avoid "tag@digest" compatibility issues.

openshift-ci Bot added size/xxl and removed size/xxl labels Jan 31, 2026

openshift-ci Bot added the needs-rebase label Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/base image zstd compression#2868

Feat/base image zstd compression#2868
jiridanek wants to merge 16 commits intoopendatahub-io:mainfrom
jiridanek:feat/base-image-zstd-compression

jiridanek commented Jan 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci Bot commented Jan 26, 2026

Uh oh!

coderabbitai Bot commented Jan 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

openshift-ci Bot commented Jan 26, 2026

Uh oh!

jiridanek commented Jan 31, 2026

Uh oh!

jiridanek commented Jan 31, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

openshift-ci Bot commented Feb 1, 2026

Uh oh!

openshift-ci Bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiridanek commented Jan 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Benchmark Results: zstd:chunked vs gzip

Image Sizes

Pull/Build Performance (averaged)

Key Findings

Detailed Results

Merge criteria:

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jan 26, 2026

Uh oh!

coderabbitai Bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

openshift-ci Bot commented Jan 26, 2026

Uh oh!

jiridanek commented Jan 31, 2026

Uh oh!

jiridanek commented Jan 31, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci Bot commented Feb 1, 2026

Uh oh!

openshift-ci Bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiridanek commented Jan 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 26, 2026 •

edited

Loading