chore(automl,autorag): refresh embedded pipeline YAMLs from upstream#7307
Conversation
Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components rhoai-3.4 branch (matching pipelines-components#5). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThree KFP pipeline YAML files were modified to replace embedded KFP component archives (the Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Security and verification notes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/lgtm |
|
/approve cancel |
|
Switching to draft until we have successful run results for all pipeilnes. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7307 +/- ##
==========================================
- Coverage 65.04% 63.91% -1.14%
==========================================
Files 2458 2513 +55
Lines 76354 77939 +1585
Branches 19257 19818 +561
==========================================
+ Hits 49668 49812 +144
- Misses 26686 28127 +1441 see 80 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml (1)
1029-1040:⚠️ Potential issue | 🟠 MajorCWE-22: Unfiltered
tarfile.extractall()on embedded archives withoutfilter=argument enables path traversal and symlink attacks.The embedded base64 tarballs in these KFP pipeline components are decoded and extracted via
__kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR)with nofilter=parameter. Per PEP 706, Python 3.12+ emits aDeprecationWarningand Python 3.14 will reject extraction without an explicit filter. More critically: any path-traversal, symlink, or device-file entry in the tarball (../../etc/passwd, absolute paths,/dev/*) will be honored and written outside the intended__KFP_EMBEDDED_ASSET_DIR, then prepended tosys.path— enabling arbitrary code execution at component import time.Affects 5 locations across 3 files:
packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml:1036packages/automl/bff/internal/pipelines/autogluon_tabular_training_pipeline/pipeline.yaml:185, 667packages/automl/bff/internal/pipelines/autogluon_timeseries_training_pipeline/pipeline.yaml:295, 807These files are generated from
red-hat-data-services/pipelines-components; the fix must be applied upstream so the codegen emitsfilter='data'in theextractall()call. Track in the linked RHOAIENG ticket to ensure the next component refresh includes this hardening.Suggested fix for upstream codegen
- __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR) + __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR, filter='data')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml` around lines 1029 - 1040, The extractall call on the embedded archive is unsafe (uses __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR)) and must be replaced with a safe extraction (either pass an explicit filter= callable per PEP 706 or emit a safe extraction helper) that: rejects absolute paths and any member with path components like '..', rejects symlinks and device files, and only allows extraction into __KFP_EMBEDDED_ASSET_DIR; update the codegen that writes extraction logic for symbols __KFP_EMBEDDED_ARCHIVE_B64, __kfp_tar, and __KFP_EMBEDDED_ASSET_DIR so generated pipeline.yaml uses the safe filter/helper instead of bare extractall.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`:
- Line 388: The pipeline.yaml entry for the image (the line containing
registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5)
no longer matches upstream commit a71ed55 and the pinned digest cannot be found
in the Red Hat catalog; fix by (1) fetching the upstream file at commit a71ed55
and producing a git diff against our pipeline.yaml to show the exact drift, (2)
querying the Red Hat catalog/API for the expected image digest and replacing the
current sha256 value with the verified digest (or revert the entire image line
to the upstream value from a71ed55), and (3) include the diff output in your PR
description and add a short note in the commit message referencing the
verification step and catalog query used.
---
Outside diff comments:
In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`:
- Around line 1029-1040: The extractall call on the embedded archive is unsafe
(uses __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR)) and must be replaced
with a safe extraction (either pass an explicit filter= callable per PEP 706 or
emit a safe extraction helper) that: rejects absolute paths and any member with
path components like '..', rejects symlinks and device files, and only allows
extraction into __KFP_EMBEDDED_ASSET_DIR; update the codegen that writes
extraction logic for symbols __KFP_EMBEDDED_ARCHIVE_B64, __kfp_tar, and
__KFP_EMBEDDED_ASSET_DIR so generated pipeline.yaml uses the safe filter/helper
instead of bare extractall.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 539e9146-73a1-45b1-951f-f2fdce61807d
📒 Files selected for processing (3)
packages/automl/bff/internal/pipelines/autogluon_tabular_training_pipeline/pipeline.yamlpackages/automl/bff/internal/pipelines/autogluon_timeseries_training_pipeline/pipeline.yamlpackages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml
| \ as f:\n json.dump(descriptor, f, indent=2)\n\n logger.info(\"\ | ||
| Documents descriptor written to %s\", descriptor_path)\n\n" | ||
| image: registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:152c98afbf558367807d9dc0bd0a203a71936476039c7130eeecc079d0605cf2 | ||
| image: registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 1. Confirm all 7 executor images in this file resolve to the same digest
# (cheap local consistency check).
fd -t f 'pipeline.yaml' packages/autorag packages/automl --exec \
sh -c 'echo "=== {} ==="; grep -nE "odh-autorag-rhel9@sha256:|odh-autogluon" "{}" | sort -u'
# 2. Diff checked-in file against upstream a71ed55 to validate byte-for-byte claim.
REL="autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml"
curl -fsSL "https://raw.githubusercontent.com/red-hat-data-services/pipelines-components/a71ed55/${REL}" -o /tmp/upstream.yaml \
&& diff -u /tmp/upstream.yaml "packages/${REL}" && echo "MATCH" || echo "DRIFT"
# 3. Verify the new digest is actually published for the tag the upstream pins.
curl -fsSL "https://catalog.redhat.com/api/containers/v1/repositories/registry/registry.redhat.io/repository/rhoai/odh-autorag-rhel9/images?filter=repositories.manifest_schema2_digest==sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5" \
| jq '.data[0] | {tag: .repositories[0].tags[0].name, published: .creation_date, vulnerabilities: .certified}'
Repository: opendatahub-io/odh-dashboard
Length of output: 1446
Byte-for-byte match with upstream a71ed55 contradicted; image digest unverified in Red Hat catalog.
The PR claims this file mirrors red-hat-data-services/pipelines-components @ a71ed55 byte-for-byte, but diff against upstream shows DRIFT—the checked-in file does not match. Additionally, the pinned digest sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5 cannot be verified in the Red Hat container catalog (404 on catalog API query).
Production pods will be pinned to an image that: (1) diverges from the claimed upstream version, (2) cannot be verified as legitimately published by Red Hat. This creates CWE-349 (untrusted image tag reuse) and CWE-295 (use of verify=False in fallback SSL paths) risk. Confirm the exact drift against a71ed55, verify the image digest is actually published, and provide the diff output showing what changed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`
at line 388, The pipeline.yaml entry for the image (the line containing
registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5)
no longer matches upstream commit a71ed55 and the pinned digest cannot be found
in the Red Hat catalog; fix by (1) fetching the upstream file at commit a71ed55
and producing a git diff against our pipeline.yaml to show the exact drift, (2)
querying the Red Hat catalog/API for the expected image digest and replacing the
current sha256 value with the verified digest (or revert the entire image line
to the upstream value from a71ed55), and (3) include the diff output in your PR
description and add a short note in the commit message referencing the
verification step and catalog query used.
|
Thanks @chrjones-rh, every flow works well on my end on a connected cluster ✅ AutoML Binary Passing
✅ AutoML Multiclass Passing
✅ AutoML TimeSeries Passing
✅ AutoML Regression Passing
✅ AutoRAG Passing
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: GAUNSD The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9677baa
into
opendatahub-io:main
…7307) (#7363) * chore(automl,autorag): refresh embedded pipeline YAMLs from upstream Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components rhoai-3.4 branch (matching pipelines-components#5). * chore(automl,autorag): refresh embedded pipeline YAMLs from upstream --------- Co-authored-by: Christopher Jones <chrjones@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pendatahub-io#7307) (opendatahub-io#7363) (#1801) * chore(automl,autorag): refresh embedded pipeline YAMLs from upstream Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components rhoai-3.4 branch (matching pipelines-components#5). * chore(automl,autorag): refresh embedded pipeline YAMLs from upstream --------- Co-authored-by: Christopher Jones <chrjones@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>








https://issues.redhat.com/browse/RHOAIENG-58435
Description
Update compiled pipeline YAMLs from
red-hat-data-services/pipelines-componentsrhoai-3.4branch, matching pipelines-components#5.This resolves a blocking issue with AutoML and AutoRAG run execution on disconnected (air-gapped) clusters where the embedded pipeline definitions referenced container images that were not available in the mirrored registry.
Files updated
packages/automl/bff/internal/pipelines/autogluon_tabular_training_pipeline/pipeline.yamlpackages/automl/bff/internal/pipelines/autogluon_timeseries_training_pipeline/pipeline.yamlpackages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yamlHow Has This Been Tested?
a71ed55) byte-for-byteTest Impact
No tests added — pipeline YAML content is validated at runtime by the Kubeflow Pipelines server.
Request review criteria:
Self checklist (all need to be checked):
If you have UI changes:
After the PR is posted & before it merges:
mainSummary by CodeRabbit