Skip to content

chore(automl,autorag): refresh embedded pipeline YAMLs from upstream#7307

Merged
openshift-merge-bot[bot] merged 3 commits intoopendatahub-io:mainfrom
chrjones-rh:RHOAIENG-58435-UI
Apr 22, 2026
Merged

chore(automl,autorag): refresh embedded pipeline YAMLs from upstream#7307
openshift-merge-bot[bot] merged 3 commits intoopendatahub-io:mainfrom
chrjones-rh:RHOAIENG-58435-UI

Conversation

@chrjones-rh
Copy link
Copy Markdown
Contributor

@chrjones-rh chrjones-rh commented Apr 17, 2026

https://issues.redhat.com/browse/RHOAIENG-58435

Description

Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components rhoai-3.4 branch, matching pipelines-components#5.

This resolves a blocking issue with AutoML and AutoRAG run execution on disconnected (air-gapped) clusters where the embedded pipeline definitions referenced container images that were not available in the mirrored registry.

Files updated

  • packages/automl/bff/internal/pipelines/autogluon_tabular_training_pipeline/pipeline.yaml
  • packages/automl/bff/internal/pipelines/autogluon_timeseries_training_pipeline/pipeline.yaml
  • packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml

How Has This Been Tested?

  • Verified all three YAMLs match the upstream PR merge commit (a71ed55) byte-for-byte
  • No code changes — YAML-only update

Test Impact

No tests added — pipeline YAML content is validated at runtime by the Kubeflow Pipelines server.

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added
  • The code follows our Best Practices

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

N/A -- YAML-only change, no UI impact.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

Summary by CodeRabbit

  • Chores
    • Updated container image versions for AutoGluon tabular training, AutoGluon timeseries training, and RAG optimization pipelines.
    • Refreshed embedded pipeline component archives for improved stability and performance.

Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components
rhoai-3.4 branch (matching pipelines-components#5).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Three KFP pipeline YAML files were modified to replace embedded KFP component archives (the __KFP_EMBEDDED_ARCHIVE_B64 base64 payloads) and to update container image digests for multiple executors. Files changed: autogluon tabular training pipeline, autogluon timeseries training pipeline, and autorag RAG optimization pipeline. No pipeline wiring, task definitions, parameters, resource settings, or public API signatures were altered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Security and verification notes

  • Verify embedded archive integrity: decode each __KFP_EMBEDDED_ARCHIVE_B64 payload and inspect contents for unexpected binaries, scripts, or credential material. Check hashes and provenance. (Relevant: CWE-494 — Download of Code Without Integrity Check.)
  • Validate image digests and provenance: confirm each updated registry digest maps to an intended, signed image release and review registry metadata and image manifests. Scan images for known vulnerabilities before deployment.
  • Ensure cryptographic verification is present: there are no signatures or attestations in the diff; add or verify image/component signature checks (e.g., Notary/TUF/COSIGN) to prevent tampering. (Relevant: CWE-347 — Improper Verification of Cryptographic Signature.)
  • CI/source control audit: ensure the archive regeneration step is reproducible and logged in CI so changes to embedded payloads are auditable (supply-chain control point).
  • Actionable remediation steps:
    • Decode and review each embedded archive, validate file list and checksums.
    • Run container image vulnerability scans and record results; block images with critical/high CVEs.
    • Implement or verify use of signed images and signed component archives; enforce verification at runtime.
    • Add CI checks to prevent accidental embedding of secrets or unexpected binaries.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: updating embedded pipeline YAMLs from upstream, affecting automl and autorag packages.
Description check ✅ Passed The description includes issue reference, detailed explanation of changes, testing verification, and completed self-checklist addressing the template requirements comprehensively.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nickmazzi
Copy link
Copy Markdown
Contributor

/lgtm
/approve

@nickmazzi
Copy link
Copy Markdown
Contributor

/approve cancel

@openshift-ci openshift-ci Bot removed the approved label Apr 17, 2026
@chrjones-rh chrjones-rh marked this pull request as draft April 17, 2026 20:39
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress This PR is in WIP state label Apr 17, 2026
@chrjones-rh
Copy link
Copy Markdown
Contributor Author

Switching to draft until we have successful run results for all pipeilnes.

@openshift-ci openshift-ci Bot removed the lgtm label Apr 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.91%. Comparing base (1ce3f26) to head (08c83ff).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7307      +/-   ##
==========================================
- Coverage   65.04%   63.91%   -1.14%     
==========================================
  Files        2458     2513      +55     
  Lines       76354    77939    +1585     
  Branches    19257    19818     +561     
==========================================
+ Hits        49668    49812     +144     
- Misses      26686    28127    +1441     

see 80 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b87f1ee...08c83ff. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chrjones-rh chrjones-rh requested review from jefho-rh and nickmazzi and removed request for MatthewAThompson and NickGagan April 22, 2026 19:28
@chrjones-rh chrjones-rh marked this pull request as ready for review April 22, 2026 19:29
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress This PR is in WIP state label Apr 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml (1)

1029-1040: ⚠️ Potential issue | 🟠 Major

CWE-22: Unfiltered tarfile.extractall() on embedded archives without filter= argument enables path traversal and symlink attacks.

The embedded base64 tarballs in these KFP pipeline components are decoded and extracted via __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR) with no filter= parameter. Per PEP 706, Python 3.12+ emits a DeprecationWarning and Python 3.14 will reject extraction without an explicit filter. More critically: any path-traversal, symlink, or device-file entry in the tarball (../../etc/passwd, absolute paths, /dev/*) will be honored and written outside the intended __KFP_EMBEDDED_ASSET_DIR, then prepended to sys.path — enabling arbitrary code execution at component import time.

Affects 5 locations across 3 files:

  • packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml:1036
  • packages/automl/bff/internal/pipelines/autogluon_tabular_training_pipeline/pipeline.yaml:185, 667
  • packages/automl/bff/internal/pipelines/autogluon_timeseries_training_pipeline/pipeline.yaml:295, 807

These files are generated from red-hat-data-services/pipelines-components; the fix must be applied upstream so the codegen emits filter='data' in the extractall() call. Track in the linked RHOAIENG ticket to ensure the next component refresh includes this hardening.

Suggested fix for upstream codegen
-        __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR)
+        __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR, filter='data')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`
around lines 1029 - 1040, The extractall call on the embedded archive is unsafe
(uses __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR)) and must be replaced
with a safe extraction (either pass an explicit filter= callable per PEP 706 or
emit a safe extraction helper) that: rejects absolute paths and any member with
path components like '..', rejects symlinks and device files, and only allows
extraction into __KFP_EMBEDDED_ASSET_DIR; update the codegen that writes
extraction logic for symbols __KFP_EMBEDDED_ARCHIVE_B64, __kfp_tar, and
__KFP_EMBEDDED_ASSET_DIR so generated pipeline.yaml uses the safe filter/helper
instead of bare extractall.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`:
- Line 388: The pipeline.yaml entry for the image (the line containing
registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5)
no longer matches upstream commit a71ed55 and the pinned digest cannot be found
in the Red Hat catalog; fix by (1) fetching the upstream file at commit a71ed55
and producing a git diff against our pipeline.yaml to show the exact drift, (2)
querying the Red Hat catalog/API for the expected image digest and replacing the
current sha256 value with the verified digest (or revert the entire image line
to the upstream value from a71ed55), and (3) include the diff output in your PR
description and add a short note in the commit message referencing the
verification step and catalog query used.

---

Outside diff comments:
In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`:
- Around line 1029-1040: The extractall call on the embedded archive is unsafe
(uses __kfp_tar.extractall(path=__KFP_EMBEDDED_ASSET_DIR)) and must be replaced
with a safe extraction (either pass an explicit filter= callable per PEP 706 or
emit a safe extraction helper) that: rejects absolute paths and any member with
path components like '..', rejects symlinks and device files, and only allows
extraction into __KFP_EMBEDDED_ASSET_DIR; update the codegen that writes
extraction logic for symbols __KFP_EMBEDDED_ARCHIVE_B64, __kfp_tar, and
__KFP_EMBEDDED_ASSET_DIR so generated pipeline.yaml uses the safe filter/helper
instead of bare extractall.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 539e9146-73a1-45b1-951f-f2fdce61807d

📥 Commits

Reviewing files that changed from the base of the PR and between b7a499a and 08c83ff.

📒 Files selected for processing (3)
  • packages/automl/bff/internal/pipelines/autogluon_tabular_training_pipeline/pipeline.yaml
  • packages/automl/bff/internal/pipelines/autogluon_timeseries_training_pipeline/pipeline.yaml
  • packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml

\ as f:\n json.dump(descriptor, f, indent=2)\n\n logger.info(\"\
Documents descriptor written to %s\", descriptor_path)\n\n"
image: registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:152c98afbf558367807d9dc0bd0a203a71936476039c7130eeecc079d0605cf2
image: registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 1. Confirm all 7 executor images in this file resolve to the same digest
#    (cheap local consistency check).
fd -t f 'pipeline.yaml' packages/autorag packages/automl --exec \
  sh -c 'echo "=== {} ==="; grep -nE "odh-autorag-rhel9@sha256:|odh-autogluon" "{}" | sort -u'

# 2. Diff checked-in file against upstream a71ed55 to validate byte-for-byte claim.
REL="autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml"
curl -fsSL "https://raw.githubusercontent.com/red-hat-data-services/pipelines-components/a71ed55/${REL}" -o /tmp/upstream.yaml \
  && diff -u /tmp/upstream.yaml "packages/${REL}" && echo "MATCH" || echo "DRIFT"

# 3. Verify the new digest is actually published for the tag the upstream pins.
curl -fsSL "https://catalog.redhat.com/api/containers/v1/repositories/registry/registry.redhat.io/repository/rhoai/odh-autorag-rhel9/images?filter=repositories.manifest_schema2_digest==sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5" \
  | jq '.data[0] | {tag: .repositories[0].tags[0].name, published: .creation_date, vulnerabilities: .certified}'

Repository: opendatahub-io/odh-dashboard

Length of output: 1446


Byte-for-byte match with upstream a71ed55 contradicted; image digest unverified in Red Hat catalog.

The PR claims this file mirrors red-hat-data-services/pipelines-components @ a71ed55 byte-for-byte, but diff against upstream shows DRIFT—the checked-in file does not match. Additionally, the pinned digest sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5 cannot be verified in the Red Hat container catalog (404 on catalog API query).

Production pods will be pinned to an image that: (1) diverges from the claimed upstream version, (2) cannot be verified as legitimately published by Red Hat. This creates CWE-349 (untrusted image tag reuse) and CWE-295 (use of verify=False in fallback SSL paths) risk. Confirm the exact drift against a71ed55, verify the image digest is actually published, and provide the diff output showing what changed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/autorag/bff/internal/pipelines/documents_rag_optimization_pipeline/pipeline.yaml`
at line 388, The pipeline.yaml entry for the image (the line containing
registry.redhat.io/rhoai/odh-autorag-rhel9@sha256:b51e1c7b2b4b857f4f5ea34654b10326196fdd1a0487012a9f7074ef092a63c5)
no longer matches upstream commit a71ed55 and the pinned digest cannot be found
in the Red Hat catalog; fix by (1) fetching the upstream file at commit a71ed55
and producing a git diff against our pipeline.yaml to show the exact drift, (2)
querying the Red Hat catalog/API for the expected image digest and replacing the
current sha256 value with the verified digest (or revert the entire image line
to the upstream value from a71ed55), and (3) include the diff output in your PR
description and add a short note in the commit message referencing the
verification step and catalog query used.

@chrjones-rh
Copy link
Copy Markdown
Contributor Author

chrjones-rh commented Apr 22, 2026

RAG run:

image

ML tabular run:

image

ML timeseries run:

image

@jefho-rh
Copy link
Copy Markdown
Contributor

Thanks @chrjones-rh, every flow works well on my end on a connected cluster

✅ AutoML Binary Passing

Screenshot 2026-04-22 at 4 13 47 PM

✅ AutoML Multiclass Passing

Screenshot 2026-04-22 at 4 14 20 PM

✅ AutoML TimeSeries Passing

Screenshot 2026-04-22 at 4 25 31 PM

✅ AutoML Regression Passing

Screenshot 2026-04-22 at 4 29 38 PM

✅ AutoRAG Passing

Screenshot 2026-04-22 at 4 31 09 PM

/lgtm

@GAUNSD
Copy link
Copy Markdown
Contributor

GAUNSD commented Apr 22, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GAUNSD

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 9677baa into opendatahub-io:main Apr 22, 2026
58 checks passed
openshift-merge-bot Bot pushed a commit that referenced this pull request Apr 22, 2026
…7307) (#7363)

* chore(automl,autorag): refresh embedded pipeline YAMLs from upstream

Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components
rhoai-3.4 branch (matching pipelines-components#5).



* chore(automl,autorag): refresh embedded pipeline YAMLs from upstream



---------

Co-authored-by: Christopher Jones <chrjones@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NickGagan pushed a commit to red-hat-data-services/odh-dashboard that referenced this pull request Apr 22, 2026
…pendatahub-io#7307) (opendatahub-io#7363) (#1801)

* chore(automl,autorag): refresh embedded pipeline YAMLs from upstream

Update compiled pipeline YAMLs from red-hat-data-services/pipelines-components
rhoai-3.4 branch (matching pipelines-components#5).



* chore(automl,autorag): refresh embedded pipeline YAMLs from upstream



---------

Co-authored-by: Christopher Jones <chrjones@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants