Skip to content

feat: update pipeline components to use latest universal image#97

Merged
openshift-merge-bot[bot] merged 1 commit into
opendatahub-io:mainfrom
Slowlybomb:RHOAIENG-56970-update-universal-image
Jun 11, 2026
Merged

feat: update pipeline components to use latest universal image#97
openshift-merge-bot[bot] merged 1 commit into
opendatahub-io:mainfrom
Slowlybomb:RHOAIENG-56970-update-universal-image

Conversation

@Slowlybomb

@Slowlybomb Slowlybomb commented May 26, 2026

Copy link
Copy Markdown
  • bump base_image to odh-th06-cpu-torch291-py312:odh-3.4 across training, data, and deployment components
  • fix _train_func signature to use **kwargs instead of positional dict
  • fix eval component to use HF backend and patch lm_eval Task import for cpu compatibility

Description of your changes:

Checklist:

Pre-Submission Checklist

Additional Checklist Items for New or Updated Components/Pipelines

  • metadata.yaml includes fresh lastVerified timestamp
  • All required files
    are present and complete
  • OWNERS file lists appropriate maintainers
  • README provides clear documentation with usage examples
  • Component follows snake_case naming convention
  • No security vulnerabilities in dependencies
  • Containerfile included if using a custom base image

Summary by CodeRabbit

  • Chores

    • Updated container base images across dataset download, model registry, and training components to a new ODH image version.
    • Updated base-image allowlist validation to reflect the new approved image.
  • Refactor

    • Adjusted evaluation component integration for updated harness imports and task subclassing for compatibility.
    • Simplified training wrappers to accept and forward keyword parameters for more flexible argument handling.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@Slowlybomb, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 27 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1f65e0c5-9dfa-43ca-8737-fd630a47aa2a

📥 Commits

Reviewing files that changed from the base of the PR and between 36a6831 and aa2894c.

📒 Files selected for processing (7)
  • components/data_processing/dataset_download/component.py
  • components/deployment/README.md
  • components/deployment/kubeflow_model_registry/component.py
  • components/evaluation/lm_eval/component.py
  • components/training/finetuning/lora/component.py
  • components/training/finetuning/osft/component.py
  • components/training/finetuning/sft/component.py
📝 Walkthrough

Walkthrough

This PR updates container base images in multiple components to quay.io/opendatahub/odh-th06-cpu-torch291-py312:odh-3.4 and updates the allowlist. The lm_eval component changes imports to use TaskConfig and Task from lm_eval.api.task and adjusts a custom Task subclass. LoRA, OSFT, and SFT training wrappers now accept keyword arguments (**p) and forward them to their respective training_hub functions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Security & Quality Issues

  1. lm_eval import and Task subclass change: confirm behavior equivalence after switching imports; improper assumptions about backend wiring or object shapes can lead to runtime errors (CWE-20: Improper Input Validation).
  2. Training wrapper signature changes (LoRA/OSFT/SFT): switching from positional/dict-normalizing p or {} to **p can raise TypeError or change handling of None/missing payloads; validate all callers and payload shapes (CWE-20).
  3. Base image and allowlist update: new image may alter bundled libraries and dependency surface; treat as supply-chain/CI change and validate presence of required native libraries and runtimes (CWE- supply-chain risks).
🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: updating pipeline components to use a new universal base image across multiple components.
Description check ✅ Passed Description provides specific details on the three main changes (base image bump, _train_func signature updates, and eval component fixes) and includes a completed pre-submission checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Contribution Quality And Spam Detection ✅ Passed Legitimate platform maintenance with coherent base image/API updates, internal contributor (RedHat email), existing tests, and no spam indicators. No security theater detected.
No Hardcoded Secrets ✅ Passed No hardcoded secrets detected. All credentials properly sourced from environment variables; base images from public registries only.
No Weak Cryptography ✅ Passed No weak cryptographic primitives, custom crypto implementations, or non-constant-time secret comparisons detected in any modified files.
No Injection Vectors ✅ Passed No injection patterns found: CWE-89 (SQL), CWE-78 (shell), CWE-94 (eval), CWE-502 (pickle) all absent. Changes are base image updates and signature modifications.
No Privileged Containers ✅ Passed No privileged container configurations (privileged: true, hostNetwork, hostPID, hostIPC, SYS_ADMIN, allowPrivilegeEscalation, runAsUser: 0) found in modified KFP components or YAML files.
No Sensitive Data In Logs ✅ Passed No logging statements expose sensitive data like passwords, tokens, API keys, or PII. Token presence is logged without revealing the actual token value.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/evaluation/lm_eval/component.py (2)

534-556: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Default trust_remote_code to False to prevent RCE during model loading.

trust_remote_code=True instructs Transformers to download and execute arbitrary Python from the model repository/artifact during from_pretrained; if final_model_path is user-controlled, this enables code execution in the evaluation pod (CWE-829 / CWE-94). Default to False and only allow remote code behind an explicit, trusted opt-in (optionally with allowlist + pinned revision).

Suggested fix
-        hf_model_args = {
-            "pretrained": final_model_path,
-            "trust_remote_code": True,
-            "dtype": "auto",
-        }
+        hf_model_args = {
+            "pretrained": final_model_path,
+            "trust_remote_code": False,
+            "dtype": "auto",
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/evaluation/lm_eval/component.py` around lines 534 - 556, The
hf_model_args currently sets "trust_remote_code": True which permits remote code
execution when loading models; change the default to "trust_remote_code": False
in the hf_model_args construction and ensure any path that sets it to True
requires an explicit, documented opt-in (e.g., a config flag or parameter)
before it's merged into hf_model_args; update the logic near hf_model_args,
final_model_path, and where model_class.create_from_arg_obj is called
(get_model("hf") / model_class) so only an explicit trusted flag can override
the False default (optionally add an allowlist/pinned revision check before
permitting True).

413-421: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip local config.json filesystem validation for HF Hub model IDs

model_path is documented to accept Hub IDs (e.g., ibm/granite-7b), and later the component passes final_model_path as HF pretrained; but lines ~413-421 unconditionally build os.path.join(final_model_path, "config.json") and fail early for non-existent “local” paths, blocking legitimate Hub resolution.

Suggested fix
-    # Verify model directory has config.json (required by HF)
-    config_path = os.path.join(final_model_path, "config.json")
-    if not os.path.exists(config_path):
-        logger.error(f"Model directory missing config.json: {final_model_path}")
-        if os.path.isdir(final_model_path):
-            logger.error(f"Directory contents: {os.listdir(final_model_path)}")
-        else:
-            logger.error("Path is NOT A DIRECTORY")
-        raise ValueError(f"Invalid model directory - no config.json found at {final_model_path}")
+    # Only local model directories can be validated via the filesystem here.
+    if os.path.exists(final_model_path):
+        if not os.path.isdir(final_model_path):
+            raise ValueError(f"Invalid model path: {final_model_path} is not a directory")
+        config_path = os.path.join(final_model_path, "config.json")
+        if not os.path.exists(config_path):
+            logger.error(f"Model directory missing config.json: {final_model_path}")
+            logger.error(f"Directory contents: {os.listdir(final_model_path)}")
+            raise ValueError(f"Invalid model directory - no config.json found at {final_model_path}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/evaluation/lm_eval/component.py` around lines 413 - 421, The
current validation always treats final_model_path as a local filesystem
directory and fails when a HF Hub model ID is used; change the logic in the
block that builds config_path (using final_model_path, config_path, logger) so
it only performs the local filesystem check when final_model_path actually
exists on disk (e.g., os.path.exists(final_model_path) and
os.path.isdir(final_model_path)); if the path does not exist locally, skip the
local config.json check and allow the HF Hub identifier to be passed through to
the HF loader instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/evaluation/lm_eval/component.py`:
- Line 14: The component unconditionally enables remote-code execution and
incorrectly requires a local config.json for HF Hub IDs; fix by defaulting
hf_model_args["trust_remote_code"] = False and only set it True when an explicit
trusted opt-in (e.g., m_args.get("trust_remote_code") is True) is provided;
change the config.json existence check around final_model_path so it only
validates config.json if os.path.isdir(final_model_path) (skip the check when
final_model_path is a non-local HF Hub ID and let the HF backend resolve it);
and pin the lm-eval dependency instead of installing the unpinned "lm-eval" (use
the tested exact version and include HF extras if needed since get_model("hf")
is used). Reference symbols: hf_model_args, m_args, final_model_path, and the
top-level lm-eval installation entry.

In `@scripts/validate_base_images/base_image_allowlist.yaml`:
- Line 13: Update the allowlist and source files to require immutable digests:
replace the tag-based entry
'quay.io/opendatahub/odh-th06-cpu-torch291-py312:odh-3.4' in
base_image_allowlist.yaml with the approved digest form
quay.io/opendatahub/odh-th06-cpu-torch291-py312@sha256:<approved_digest>, change
the hardcoded base_image literals in
components/data_processing/dataset_download/component.py,
components/deployment/kubeflow_model_registry/component.py,
components/training/finetuning/{lora,osft,sft}/component.py to use the same
`@sha256` digest form, and remove or tighten the tag-matching
allowed_image_patterns entry (the regex
^quay\.io/opendatahub/odh-[\w-]+:odh-.+$) so the validator
(validate_base_images) no longer accepts mutable tag-based images.

---

Outside diff comments:
In `@components/evaluation/lm_eval/component.py`:
- Around line 534-556: The hf_model_args currently sets "trust_remote_code":
True which permits remote code execution when loading models; change the default
to "trust_remote_code": False in the hf_model_args construction and ensure any
path that sets it to True requires an explicit, documented opt-in (e.g., a
config flag or parameter) before it's merged into hf_model_args; update the
logic near hf_model_args, final_model_path, and where
model_class.create_from_arg_obj is called (get_model("hf") / model_class) so
only an explicit trusted flag can override the False default (optionally add an
allowlist/pinned revision check before permitting True).
- Around line 413-421: The current validation always treats final_model_path as
a local filesystem directory and fails when a HF Hub model ID is used; change
the logic in the block that builds config_path (using final_model_path,
config_path, logger) so it only performs the local filesystem check when
final_model_path actually exists on disk (e.g., os.path.exists(final_model_path)
and os.path.isdir(final_model_path)); if the path does not exist locally, skip
the local config.json check and allow the HF Hub identifier to be passed through
to the HF loader instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 717294d5-151a-4123-921e-7d1677c09277

📥 Commits

Reviewing files that changed from the base of the PR and between 9940c1e and c4f3db5.

📒 Files selected for processing (7)
  • components/data_processing/dataset_download/component.py
  • components/deployment/kubeflow_model_registry/component.py
  • components/evaluation/lm_eval/component.py
  • components/training/finetuning/lora/component.py
  • components/training/finetuning/osft/component.py
  • components/training/finetuning/sft/component.py
  • scripts/validate_base_images/base_image_allowlist.yaml

base_image="registry.access.redhat.com/ubi9/python-311:latest",
packages_to_install=[
"lm-eval[vllm]", # The core harness with vLLM backend
"lm-eval", # The core harness (CPU/HF backend — no ray/vLLM dependency)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What is the latest lm-evalrelease on PyPI, and do its release notes mention breaking changes toTask imports or Hugging Face backend loading?

💡 Result:

Latest lm-eval release on PyPI (as of 2026-05-26): v0.4.12, dated 2026-05-11. [1] Do the release notes mention breaking changes to Task imports? No. The Task-import-breaking changes are not mentioned in the available release-note snippet for v0.4.12; the v0.4.12 release notes we found focus on fixes/improvements and do not call out Task import path changes. [2] Do the release notes mention breaking changes to Hugging Face backend loading? No for v0.4.12 specifically. However, a previous release (v0.4.10) contains a breaking-change note that the base pip install lm_eval no longer installs the Hugging Face/torch stack by default, and that Hugging Face backend support requires installing extras (pip install lm_eval[hf]). [3] So: v0.4.12 is the latest PyPI release, and its release notes (based on what we could retrieve) do not mention breaking Task imports or Hugging Face backend loading changes, though Hugging Face backend loading changes were explicitly called out in v0.4.10. [1][2][3]

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="components/evaluation/lm_eval/component.py"

# Show file with line numbers around the referenced sections
nl -ba "$FILE" | sed -n '1,180p'
nl -ba "$FILE" | sed -n '380,470p'
nl -ba "$FILE" | sed -n '480,620p'

Repository: opendatahub-io/pipelines-components

Length of output: 122


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="components/evaluation/lm_eval/component.py"

nl -ba "$FILE" | sed -n '1,180p'
nl -ba "$FILE" | sed -n '380,470p'
nl -ba "$FILE" | sed -n '480,620p'

Repository: opendatahub-io/pipelines-components

Length of output: 122


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="components/evaluation/lm_eval/component.py"

# Print with line numbers using cat -n (nl is unavailable in the sandbox)
cat -n "$FILE" | sed -n '1,220p'
cat -n "$FILE" | sed -n '360,470p'
cat -n "$FILE" | sed -n '500,620p'

Repository: opendatahub-io/pipelines-components

Length of output: 20559


Critical: Remove unconditional remote-code execution and fix HF Hub ID loading

  • Security (Critical, CWE-94): components/evaluation/lm_eval/component.py sets trust_remote_code=True unconditionally when building hf_model_args (lines 534-537). If model_path/model_artifact is user-controlled and points to a malicious HF repo, model loading can execute arbitrary code (RCE). Default to False and require an explicit opt-in for trusted sources (e.g., trusted local dirs). Example:
    hf_model_args = {
        "pretrained": final_model_path,
        "trust_remote_code": False,  # default
        "dtype": "auto",
    }
    # optionally allow remote code only via an explicit, trusted flag/input
    if m_args.get("trust_remote_code") is True:
        hf_model_args["trust_remote_code"] = True
  • Correctness: The documented model_path supports HF Hub IDs (e.g., "ibm/granite-7b"), but the config.json filesystem check enforces a local directory (final_model_path/config.json) and raises ValueError for non-local IDs (lines 413-421). Only require config.json when final_model_path is an existing local directory; otherwise skip and let the HF backend resolve the Hub model.
  • Reproducibility: Pin lm-eval instead of installing unpinned "lm-eval" (line ~14), since upstream API drift is already handled; use the exact tested version (e.g., lm-eval==0.4.12) and consider installing HF extras (e.g., lm-eval[hf]) since the component loads get_model("hf").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/evaluation/lm_eval/component.py` at line 14, The component
unconditionally enables remote-code execution and incorrectly requires a local
config.json for HF Hub IDs; fix by defaulting hf_model_args["trust_remote_code"]
= False and only set it True when an explicit trusted opt-in (e.g.,
m_args.get("trust_remote_code") is True) is provided; change the config.json
existence check around final_model_path so it only validates config.json if
os.path.isdir(final_model_path) (skip the check when final_model_path is a
non-local HF Hub ID and let the HF backend resolve it); and pin the lm-eval
dependency instead of installing the unpinned "lm-eval" (use the tested exact
version and include HF extras if needed since get_model("hf") is used).
Reference symbols: hf_model_args, m_args, final_model_path, and the top-level
lm-eval installation entry.

allowed_images:
- 'quay.io/opendatahub/odh-training-th04-cpu-torch29-py312-rhel9:cpu-3.3'
- 'registry.access.redhat.com/ubi9/python-311:latest'
- 'quay.io/opendatahub/odh-th06-cpu-torch291-py312:odh-3.4'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Tag-based references in touched files:"
rg -n 'odh-th06-cpu-torch291-py312:odh-3\.4' \
  scripts/validate_base_images/base_image_allowlist.yaml \
  components/data_processing/dataset_download/component.py \
  components/deployment/kubeflow_model_registry/component.py \
  components/training/finetuning/lora/component.py \
  components/training/finetuning/osft/component.py \
  components/training/finetuning/sft/component.py

echo
echo "Digest-pinned references in touched files (expected after fix):"
rg -n '`@sha256`:' \
  scripts/validate_base_images/base_image_allowlist.yaml \
  components/data_processing/dataset_download/component.py \
  components/deployment/kubeflow_model_registry/component.py \
  components/training/finetuning/lora/component.py \
  components/training/finetuning/osft/component.py \
  components/training/finetuning/sft/component.py || true

Repository: opendatahub-io/pipelines-components

Length of output: 950


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Locate allowlist loader + validator:"
rg -n "load_base_image_allowlist|is_valid_base_image|validate_base_images\.py" -S \
  scripts/validate_base_images || true

echo
echo "Show implementation details (if present):"
rg -n "def load_base_image_allowlist|def is_valid_base_image" -S \
  scripts/validate_base_images/validate_base_images.py || true

echo
echo "Print the allowlist-loading snippet area (around matched lines):"
rg -n "load_base_image_allowlist" -n scripts/validate_base_images/validate_base_images.py -C 3 || true

Repository: opendatahub-io/pipelines-components

Length of output: 8766


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Find lib/base_image implementation files:"
fd -a "base_image*" scripts/validate_base_images -t f

echo
echo "Locate load_base_image_allowlist + is_valid_base_image definitions:"
rg -n "def load_base_image_allowlist|def is_valid_base_image" -S \
  scripts/validate_base_images/lib || true

echo
echo "Show the relevant implementation blocks (around definitions):"
for f in $(rg -l "def load_base_image_allowlist|def is_valid_base_image" -S scripts/validate_base_images/lib); do
  echo "---- $f ----"
  rg -n "def load_base_image_allowlist|def is_valid_base_image" -n "$f" -C 3
done

echo
echo "Check tests for digest usage/behavior:"
rg -n "`@sha256`:|digest|digests" scripts/validate_base_images/tests/test_validate_base_images.py -n || true

Repository: opendatahub-io/pipelines-components

Length of output: 686


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Search for BaseImageAllowlist / load_base_image_allowlist definitions across repo:"
rg -n "BaseImageAllowlist|load_base_image_allowlist|is_valid_base_image" -S . || true

echo
echo "Show where base_image.py lives (if any):"
fd -i "base_image.py" . || true

echo
echo "Search for '`@sha256`:' handling in code:"
rg -n "`@sha256`:|digest(s)?|sha256" -S . || true

echo
echo "Inspect allowlist matcher logic by opening candidate file(s):"
for f in $(rg -l "load_base_image_allowlist|is_valid_base_image" -S .); do
  echo "---- $f ----"
  sed -n '1,220p' "$f" | sed -n '1,220p' || true
done

Repository: opendatahub-io/pipelines-components

Length of output: 50392


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== scripts/validate_base_images/base_image_allowlist.yaml (top) ==="
sed -n '1,120p' scripts/validate_base_images/base_image_allowlist.yaml

echo
echo "=== scripts/lib/base_image.py: show load_base_image_allowlist + matcher ==="
sed -n '1,260p' scripts/lib/base_image.py

echo
echo "=== scripts/lib/base_image.py: show is_valid_base_image ==="
sed -n '260,420p' scripts/lib/base_image.py

echo
echo "=== scripts/validate_base_images/tests/test_validate_base_images.py: digest-related assertions ==="
rg -n "sha256:|digest" scripts/validate_base_images/tests/test_validate_base_images.py || true

echo
echo "=== scripts/validate_base_images/tests/test_validate_base_images.py: allowlist parsing behavior ==="
rg -n "load_base_image_allowlist|BaseImageAllowlist|allowlist" scripts/validate_base_images/tests/test_validate_base_images.py -n | head -n 80

Repository: opendatahub-io/pipelines-components

Length of output: 21121


Pin allowlisted base images to immutable digests (CWE-494, major).

The validator allows base images via exact string match / regex match only (image in allowed_images or allowed_image_patterns), with no digest/content integrity enforcement—so mutable tags like quay.io/opendatahub/odh-th06-cpu-torch291-py312:odh-3.4 will keep passing even if upstream retags.

Exploit scenario: the registry retags :odh-3.4 (or attacker swaps bytes for that tag) and component runtime images change while validate_base_images still accepts the original tag string.

Remediation code
 allowed_images:
-  - 'quay.io/opendatahub/odh-th06-cpu-torch291-py312:odh-3.4'
+  - 'quay.io/opendatahub/odh-th06-cpu-torch291-py312@sha256:<approved_digest>'

Update the base_image="..." literals currently using the tag:

  • components/data_processing/dataset_download/component.py
  • components/deployment/kubeflow_model_registry/component.py
  • components/training/finetuning/lora/component.py
  • components/training/finetuning/osft/component.py
  • components/training/finetuning/sft/component.py

to quay.io/opendatahub/odh-th06-cpu-torch291-py312@sha256:<approved_digest>.

Also remove/replace the tag-based allowed_image_patterns entry that matches ^quay\.io/opendatahub/odh-[\w-]+:odh-.+$ (otherwise unrelated retags can still pass via the regex).

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @scripts/validate_base_images/base_image_allowlist.yaml at line 13, Update
the allowlist and source files to require immutable digests: replace the
tag-based entry 'quay.io/opendatahub/odh-th06-cpu-torch291-py312:odh-3.4' in
base_image_allowlist.yaml with the approved digest form
quay.io/opendatahub/odh-th06-cpu-torch291-py312@sha256:<approved_digest>, change
the hardcoded base_image literals in
components/data_processing/dataset_download/component.py,
components/deployment/kubeflow_model_registry/component.py,
components/training/finetuning/{lora,osft,sft}/component.py to use the same
@sha256 digest form, and remove or tighten the tag-matching
allowed_image_patterns entry (the regex
^quay.io/opendatahub/odh-[\w-]+:odh-.+$) so the validator
(validate_base_images) no longer accepts mutable tag-based images.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

@Fiona-Waters Fiona-Waters left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Slowlybomb
I tested this with the lora_minimal pipeline, the changes in the dataset_download, kubeflow_model_registry and lora components work as expected. I have left comments for the rest.

# - allowed_image_patterns: list of regex patterns matched against the full image
allowed_images:
- 'quay.io/opendatahub/odh-training-th04-cpu-torch29-py312-rhel9:cpu-3.3'
- 'registry.access.redhat.com/ubi9/python-311:latest'

@Fiona-Waters Fiona-Waters May 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think 'registry.access.redhat.com/ubi9/python-311:latest' should be removed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will put it back. Should had ask about it before removing.

from lm_eval.evaluator import evaluate
from lm_eval.tasks import get_task_dict

# lm-eval >= 0.4 moved Task out of the tasks module — patch it back for compatibility

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is needed. After adding the Task import you can just use that directly here: https://github.com/opendatahub-io/pipelines-components/blob/main/components/evaluation/lm_eval/component.py#L170 as class ChatHoldoutTask(Task)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will trust you on this. But I am still worrying, that import can brake. So it is nice to have explanation

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I wasn't clear here, I am not asking for you to remove the comment only but the way that you have used Task.

base_image="registry.access.redhat.com/ubi9/python-311:latest",
packages_to_install=[
"lm-eval[vllm]", # The core harness with vLLM backend
"lm-eval", # The core harness (CPU/HF backend — no ray/vLLM dependency)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't change this. It should be the vllm backend. I think we need to change the image that is used in this component, possibly to our 3.4 cuda image but that would need to be tested. When I ran it, I got errors related to missing cuda libs. This might be outside the scope of this PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I am not mistaken, this was problem with Ray, that was required for vllm, so I had to remove it.

So just return [vllm]?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fiona-Waters, thank you for review!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing vllm changes the way in which this component works, which we don't want. You have switched from using GPU to using CPU - but this component is meant to run on GPU. We could try this with the cuda image, instead of the cpu one. This could possibly be done in a separate PR, but in this PR we should revert these changes that update lm-eval not to use vllm.

@Slowlybomb Slowlybomb force-pushed the RHOAIENG-56970-update-universal-image branch from c4f3db5 to 36a6831 Compare June 10, 2026 11:29

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/evaluation/lm_eval/component.py`:
- Around line 79-87: The import block is unsorted causing Ruff I001; reorder
imports so stdlib (if any) come first, then third-party "import torch", then the
delayed lm_eval imports (e.g., from lm_eval.api.instance import Instance, from
lm_eval.api.metrics import mean, from lm_eval.api.registry import get_model,
from lm_eval.api.task import TaskConfig, Task, from lm_eval.evaluator import
evaluate, from lm_eval.tasks import get_task_dict), or simply run "ruff check
--fix" to auto-fix the ordering.
- Line 85: The project currently imports TaskConfig and Task from
lm_eval.api.task but the dependency lm-eval[vllm] is installed unpinned; update
the package entry in the packages_to_install list to pin lm-eval[vllm] to an
exact published version (e.g. "lm-eval[vllm]==X.Y.Z") and preferably include the
corresponding pip hash(s) to mitigate supply-chain/CWE-494 risks; ensure any
CI/deployment scripts or setup code that references packages_to_install (and any
requirement files) are updated to use the pinned spec so the TaskConfig/Task API
surface remains stable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 70351b48-5484-4fb4-8e39-191d4ccbac04

📥 Commits

Reviewing files that changed from the base of the PR and between c4f3db5 and 36a6831.

📒 Files selected for processing (7)
  • components/data_processing/dataset_download/component.py
  • components/deployment/kubeflow_model_registry/component.py
  • components/evaluation/lm_eval/component.py
  • components/training/finetuning/lora/component.py
  • components/training/finetuning/osft/component.py
  • components/training/finetuning/sft/component.py
  • scripts/validate_base_images/base_image_allowlist.yaml
✅ Files skipped from review due to trivial changes (2)
  • components/data_processing/dataset_download/component.py
  • components/deployment/kubeflow_model_registry/component.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • components/training/finetuning/osft/component.py
  • components/training/finetuning/sft/component.py
  • components/training/finetuning/lora/component.py

Comment thread components/evaluation/lm_eval/component.py
Comment thread components/evaluation/lm_eval/component.py Outdated

@Fiona-Waters Fiona-Waters left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more tiny things.
To fix the lint check failure run ruff check --fix components/evaluation/lm_eval/component.py on your local machine.

@@ -236,7 +236,7 @@ def _params() -> Dict:

params = _params()

def _train_func(p):
def _train_func(**p):
a = dict(p or {})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this to match the other please?

Suggested change
a = dict(p or {})
a = dict(p)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this line? self.task_name = task_name

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one also? self.config.task = task_name they are not required based on the other Task related changes.

@Slowlybomb Slowlybomb force-pushed the RHOAIENG-56970-update-universal-image branch 3 times, most recently from 64bc663 to c4b1ce4 Compare June 11, 2026 10:34
- bump base_image to odh-th06-cpu-torch291-py312:odh-3.4 across
  training, data processing, and deployment components
- fix _train_func signature to use **kwargs in lora, osft, sft
- fix lm_eval Task import for direct usage (from lm_eval.api.task)
- remove unnecessary self.task_name and self.config.task assignments
- regenerate deployment category README after upstream rebase

Signed-off-by: Slowlybomb <hslyusar@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@Slowlybomb Slowlybomb force-pushed the RHOAIENG-56970-update-universal-image branch from c4b1ce4 to aa2894c Compare June 11, 2026 10:45

@Fiona-Waters Fiona-Waters left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the sft minimal pipeline to test, works as expected. Follow on issue required to update the image in the lm_eval component. Thanks for your work on this @Slowlybomb 🎉
/lgtm
/approve

@Fiona-Waters

Copy link
Copy Markdown

@nsingla we need to merge this but I don't seem to have permissions any more. Can you help please?

@nsingla nsingla left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Fiona-Waters, nsingla

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nsingla

nsingla commented Jun 11, 2026

Copy link
Copy Markdown

/ok-to-test

@openshift-merge-bot openshift-merge-bot Bot merged commit 71547ab into opendatahub-io:main Jun 11, 2026
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants