Feat/base image zstd compression#2868
Feat/base image zstd compression#2868jiridanek wants to merge 16 commits intoopendatahub-io:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughAdds two new Tekton Pipelines for multi-arch base-image build/push flows and updates ~22 PipelineRun manifests to reference them; several PipelineRuns also add Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
1f8a3e5 to
b654e5d
Compare
Add new pipelines for base images that enable zstd:chunked compression for faster image pulls. This is especially important for CUDA images which can be 10-15GB+. New pipelines: - base-image-multiarch-push-pipeline.yaml - base-image-multiarch-pull-request-pipeline.yaml These pipelines add a convert-compression task that runs after build-image-index to re-push images with zstd:chunked compression and --force-compression flag (required for CUDA base layers). Updated one example PipelineRun to reference the new pipeline: - odh-base-image-cuda-py312-ubi9-push.yaml - odh-base-image-cuda-py312-ubi9-pull-request.yaml Related: konflux-ci/build-definitions#3188
…pipeline Update all base-image PipelineRuns to reference the new pipelines with zstd:chunked compression support: - Push pipelines: base-image-multiarch-push-pipeline - Pull-request pipelines: base-image-multiarch-pull-request-pipeline For ROCm images that were using singlearch-push-pipeline, switched to multiarch pipeline with explicit linux-mxlarge/amd64 platform. This enables faster image pulls for all base images including: - CPU base images (c9s and ubi9) - CUDA base images (11, 12.6, 12.8, 13.0) - ROCm base images (6.2, 6.3, 6.4)
Updated the base-image pipelines to convert architecture-specific images to zstd:chunked compression **before** creating the manifest index. Refactored the pipeline flow to `build-images → convert-compression (per arch) → build-image-index`. Adjusted dependent tasks to consume updated compression results for better consistency and optimization.
…ask names - Move convert-compression task before build-image-index so each arch image is compressed before the manifest index is created - Change from matrix task to single task processing array to avoid extremely long TaskRun names that break the Konflux UI - Add unified multiarch-pipeline.yaml with optional compression support - Compression is now processed sequentially but with cleaner task names
Use jq -cn with $ARGS.positional to properly format the JSON array for Tekton typed array results. The previous approach with printf and jq -R was causing malformed array values.
Remove the extra array wrapping (- prefix) when passing array results. The syntax `value: $(tasks.X.results.ARRAY[*])` expands directly into the array parameter, while `value: [- $(tasks.X.results.ARRAY[*])]` incorrectly wraps the expansion in another array level.
The Tekton typed array result expansion doesn't work correctly when passing to build-image-index IMAGES parameter. The array result from convert-compression task was only passing the last element. Revert to the simpler approach where compression happens AFTER build-image-index. When buildah manifest push is called with --compression-format --force-compression --all, it re-pushes all architecture images with the new compression.
This test branch uses the zstd:chunked compressed base images built in PR #2868 to measure whether GHA builds are faster with the new compression format. Base images updated to digest-pinned zstd:chunked versions: - CPU C9S: sha256:2580cb333... - CUDA C9S (13.0): sha256:2a4cb3a49... - CUDA UBI9 (12.8): sha256:7032289e4... - ROCm C9S (6.4): sha256:f76c06cbc... - ROCm UBI9 (6.4): sha256:13f2bc5bd... Expected improvements based on benchmarks: - CUDA image pulls: ~53% faster - CUDA builds: ~64% faster - CPU image pulls: ~18% faster Co-authored-by: Cursor <cursoragent@cursor.com>
- Added local manifest list creation and platform addition for multiarch images. - Updated `buildah manifest inspect` to remove `docker://` transport. - Enhanced cleanup process after manifest push for robustness.
|
/kf-build base-images/cpu/c9s-python-3.12 |
|
/kfbuild base-images/cpu/c9s-python-3.12 |
- Strip tags from IMAGE_URL to avoid "tag@digest" issues in buildah manifest inspect. - Prevents incorrect resolution of tag over digest due to single-arch manifests.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.tekton/base-image-multiarch-pull-request-pipeline.yaml:
- Around line 339-344: The current removal of tag via
IMAGE_BASE="${IMAGE_URL%:*}" can strip registry ports; change to first isolate
the last path segment and only strip a tag if that segment contains a colon:
derive NAME="${IMAGE_URL##*/}", if NAME contains ":" set
IMAGE_BASE="${IMAGE_URL%:*}" else set IMAGE_BASE="${IMAGE_URL}" and then
construct IMAGE_REF="${IMAGE_BASE}@${IMAGE_DIGEST}"; update uses of
IMAGE_URL/IMAGE_BASE/IMAGE_REF accordingly so ports are preserved.
🧹 Nitpick comments (5)
.tekton/base-image-multiarch-push-pipeline.yaml (4)
133-145: Consider adding meaningful descriptions to pipeline results.All result descriptions are empty strings. Adding brief descriptions (e.g., "Final compressed image URL", "Final compressed image digest") would improve pipeline documentation and usability.
159-159: Consider pinning the Alpine image to a specific digest.Using
latesttag can lead to non-reproducible builds if the image changes upstream. For consistency with other tasks in this pipeline that use pinned digests, consider using a specific version or digest.
464-469: Consider adding CPU limit for resource consistency.Memory limits are specified but CPU limit is missing. For large image compression operations, having explicit CPU limits helps with cluster resource planning and prevents potential resource contention.
🔧 Proposed fix
resources: requests: memory: 4Gi cpu: 500m limits: memory: 8Gi + cpu: "2"
447-454: Add retry logic for large image push/pull operations.The
buildah pullandbuildah pushoperations lack retry logic. For large CUDA/ROCm images (multi-GB), transient network failures could cause pipeline failures. Consider implementing a simple retry wrapper around these operations to improve reliability..tekton/base-image-multiarch-pull-request-pipeline.yaml (1)
111-123: Add descriptions to pipeline results for clarity.The result descriptions are empty. Adding meaningful descriptions improves pipeline documentation and helps users understand what each result provides.
📝 Suggested descriptions
results: - - description: "" + - description: Fully qualified image URL after compression conversion name: IMAGE_URL value: $(tasks.convert-compression.results.IMAGE_URL) - - description: "" + - description: Image digest after compression conversion name: IMAGE_DIGEST value: $(tasks.convert-compression.results.IMAGE_DIGEST) - - description: "" + - description: Source repository URL for Tekton Chains provenance name: CHAINS-GIT_URL value: $(tasks.clone-repository.results.url) - - description: "" + - description: Git commit SHA for Tekton Chains provenance name: CHAINS-GIT_COMMIT value: $(tasks.clone-repository.results.commit)
- Added logic to distinguish registry ports from tags in IMAGE_URL. - Ensures correct formatting for buildah manifest inspect to avoid "tag@digest" compatibility issues.
…port - Added user namespace setup for buildah tasks to align with Konflux build environment. - Integrated manifest conversion and single-image push scripts with unshare for consistency. - Introduced permission checks and debug logging for converted images.
|
@jiridanek: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
https://issues.redhat.com/browse/STONEBLD-4276
Description
Our CUDA and ROCm base images are large (8-15GB), resulting in slow container startup times.
When deploying workbenches on Kubernetes/OpenShift, the image pull time significantly impacts
user experience, especially for the first deployment on a node.
The
zstd:chunkedcompression format, developed by Red Hat engineers Giuseppe Scrivano andDan Walsh, enables "partial pulls" where container runtimes can fetch only the layers and
files actually needed, rather than downloading the entire image.
Key benefits of zstd:chunked (source):
How Has This Been Tested?
Here's a concise PR comment summarizing the benchmark results:
Benchmark Results: zstd:chunked vs gzip
Tested across 3 GitHub Actions runs with varying network conditions.
Image Sizes
Pull/Build Performance (averaged)
Key Findings
Detailed Results
Self checklist (all need to be checked):
make test(gmakeon macOS) before asking for reviewDockerfile.konfluxfiles should be done inodh/notebooksand automatically synced torhds/notebooks. For Konflux-specific changes, modifyDockerfile.konfluxfiles directly inrhds/notebooksas these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.