periodic sync upstream KF to midstream ODH #291

tarilabs · 2025-07-24T12:53:16Z

keep add support to async job for huggingface source kubeflow/model-registry#1365
keep Add .port-forwards.pid to .gitignore kubeflow/model-registry#1370
keep [async-job] E2E Test with Sample Job kubeflow/model-registry#1326
keep ci: fix root Make image/push kubeflow/model-registry#1372
resolving conflict with f9917de

labels are applied

/ok-to-test
/lgtm
/approve

Summary by CodeRabbit

New Features
- Added support for downloading models from Hugging Face repositories via URI in the async-upload job.
- Introduced integration tests for the async-upload job, including documentation and test infrastructure.
- Added new command-line argument to specify source URI for model downloads.
Bug Fixes
- Explicitly set namespaces for Kubernetes resources in sample job manifests to avoid deployment issues.
Chores
- Improved .gitignore and .dockerignore handling for temporary files.
- Updated workflow and Makefile logic for better image variable management and integration test execution.
- Refined readiness probe frequency for model registry deployment.
Documentation
- Added comprehensive integration test README for async-upload job.

* add support to async job for huggingface source This enables the model storage async job to have as the source a hf URI in the following forms: hf://repo-name/model-name hf://repo-name/model-name:hash Signed-off-by: Jon Burdo <[email protected]> * use uri as source type for hf Signed-off-by: Jon Burdo <[email protected]> --------- Signed-off-by: Jon Burdo <[email protected]>

This is a temporary file written to in some Makefiles when running clusters locally for testing. Signed-off-by: Jon Burdo <[email protected]>

* chore(async-job): add script to setup and run sample job Signed-off-by: Eric Dobroveanu <[email protected]> * chore: adjust readiness probe for faster tests Signed-off-by: Eric Dobroveanu <[email protected]> * test(async-job): convert bash-based test to python-based Signed-off-by: Eric Dobroveanu <[email protected]> * test(async-job): add readme for integration tests Signed-off-by: Eric Dobroveanu <[email protected]> * chore(async-job): ensure correct make target is run in GH action Signed-off-by: Eric Dobroveanu <[email protected]> * chore(async-job): update lockfile and convert to use boto3 Signed-off-by: Eric Dobroveanu <[email protected]> * test(async-job): simplify the integration tests Signed-off-by: Eric Dobroveanu <[email protected]> * chore(async-job): remove unused job-values.yaml Signed-off-by: Eric Dobroveanu <[email protected]> * chore(async-job): ensure async job has a separate env var from mr service Signed-off-by: Eric Dobroveanu <[email protected]> * chore(async-job): adjust e2e tests to be able to build the images Signed-off-by: Eric Dobroveanu <[email protected]> * chore(async-job): move env vars to the top level Signed-off-by: Eric Dobroveanu <[email protected]> --------- Signed-off-by: Eric Dobroveanu <[email protected]>

* ci: fix root Make image/push Signed-off-by: Matteo Mortari <[email protected]> * to trigger ci Signed-off-by: Matteo Mortari <[email protected]> * revert 8356c82 Signed-off-by: Matteo Mortari <[email protected]> * fix async-job Makefile Signed-off-by: Matteo Mortari <[email protected]> * invoke script with expected IMG var Signed-off-by: Matteo Mortari <[email protected]> --------- Signed-off-by: Matteo Mortari <[email protected]>

Resolving conflict On .github/workflows/async-upload-test.yml Makefile Signed-off-by: Matteo Mortari <[email protected]>

openshift-ci · 2025-07-24T12:53:20Z

@tarilabs: you cannot LGTM your own PR.

Details

In response to this:

keep add support to async job for huggingface source kubeflow/model-registry#1365

keep Add .port-forwards.pid to .gitignore kubeflow/model-registry#1370

keep [async-job] E2E Test with Sample Job kubeflow/model-registry#1326

keep ci: fix root Make image/push kubeflow/model-registry#1372

resolving conflict with f9917de

labels are applied

/ok-to-test
/lgtm
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2025-07-24T12:53:24Z

Walkthrough

This update introduces support for a new source type ("uri") in the async-upload job, enabling Hugging Face model downloads, and adds comprehensive integration tests for the async-upload workflow within Kubernetes. It refactors Makefile and workflow image variable logic, updates ignore files, and enhances documentation and test configuration for integration scenarios.

Changes

File(s)	Change Summary
`.github/workflows/async-upload-test.yml`, `Makefile`, `jobs/async-upload/Makefile`	Refactored image variable logic, expanded workflow environment variables, updated job/test commands, and added new integration test targets and variables.
`.gitignore`, `clients/python/.gitignore`, `jobs/async-upload/.dockerignore`	Updated ignore patterns to include/exclude `.port-forwards.pid` as appropriate.
`jobs/async-upload/job/config.py`, `jobs/async-upload/job/download.py`	Added support for "uri" source type, including Hugging Face model download logic and corresponding configuration/validation.
`jobs/async-upload/pyproject.toml`, `jobs/async-upload/tests/conftest.py`	Added integration test dependency group and pytest marker/option for integration tests.
`jobs/async-upload/tests/integration/README.md`, `jobs/async-upload/tests/integration/__init__.py`	Added documentation and module initialization for integration tests.
`jobs/async-upload/tests/integration/test_integration_async_upload.py`	Added comprehensive integration test for async-upload job involving Kubernetes, MinIO, and model registry.
`jobs/async-upload/samples/sample_job_s3_to_oci.yaml`	Added explicit `namespace: default` to Kubernetes resource definitions.
`manifests/kustomize/base/model-registry-deployment.yaml`	Changed readiness probe interval from 60s to 20s.

Sequence Diagram(s)

sequenceDiagram
    participant Tester as Integration Test
    participant K8s as Kubernetes Cluster
    participant MR as Model Registry
    participant MinIO as S3 Storage
    participant Job as Async-Upload Job

    Tester->>MR: Create model, version, placeholder artifact
    Tester->>MinIO: Upload model file
    Tester->>K8s: Apply async-upload Job manifest
    K8s->>Job: Schedule Job pod
    Job->>MinIO: Download model file from S3
    Job->>MR: Update artifact with URI/state
    Tester->>K8s: Poll for Job completion
    Tester->>MR: Verify artifact updated
    Tester->>K8s: Cleanup Job resource

sequenceDiagram
    participant Job as Async-Upload Job
    participant HF as Hugging Face Hub

    Job->>Job: Parse config (source type: "uri")
    Job->>HF: Download model via huggingface_hub
    Job->>Job: Store model in destination

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

async-upload: chore align GHAs and README #266: Refactors async-upload workflow and Makefile image variable naming, related to the current PR's restructuring of image variables and workflow logic.

Suggested labels

ok-to-test, Area/GitHub, Area/MR Python client, Area/Manifests

Suggested reviewers

pboyd

Poem

🐇
In clusters where models hop and play,
New URIs guide the async way.
Hugging Face joins the MinIO stream,
Integration tests fulfill the dream.
With Makefile tweaks and probes anew,
The registry’s ready—thanks to you!
🌱

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

openshift-ci · 2025-07-24T12:53:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tarilabs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [tarilabs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-07-24T12:53:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tarilabs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [tarilabs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-07-24T12:53:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tarilabs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [tarilabs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (5)

jobs/async-upload/samples/sample_job_s3_to_oci.yaml (1)
5-7: Hard-coded namespace: default limits sample portability

Embedding namespace: default forces users to edit the file before applying it in any namespace-scoped environment (e.g., GitOps overlays, multi-tenant clusters).

Optional refactor:
-  namespace: default
+  # namespace: default
or document that users should patch the namespace when applying.

Also applies to: 17-19, 28-30
.github/workflows/async-upload-test.yml (1)
78-80: Fix misleading step name.

The step name says "Execute Sample Job E2E test" but it's actually running integration tests via make test-integration. Consider updating the name for clarity.
-      - name: Execute Sample Job E2E test
+      - name: Execute Sample Job Integration test
jobs/async-upload/Makefile (1)

21-21: Track the consistency issue in an issue

The TODO mentions an important consistency issue (RHOAIENG-30453). Consider creating a GitHub issue to track this if not already done.

Would you like me to create an issue to track this IMG tag consistency problem?
jobs/async-upload/tests/integration/test_integration_async_upload.py (2)
224-224: Use explicit exception chaining

Follow Python best practices for exception handling.

Apply this diff:
-        raise Exception(f"Failed to upload to MinIO: {e}")
+        raise Exception(f"Failed to upload to MinIO: {e}") from e
296-296: Remove unnecessary f-string prefix

The string doesn't contain any placeholders.
-        key = f"my-model/mnist-8.onnx"
+        key = "my-model/mnist-8.onnx"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66cd3a5 and f9917de.

⛔ Files ignored due to path filters (1)

jobs/async-upload/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (15)

.github/workflows/async-upload-test.yml (2 hunks)
.gitignore (1 hunks)
Makefile (1 hunks)
clients/python/.gitignore (0 hunks)
jobs/async-upload/.dockerignore (1 hunks)
jobs/async-upload/Makefile (3 hunks)
jobs/async-upload/job/config.py (8 hunks)
jobs/async-upload/job/download.py (2 hunks)
jobs/async-upload/pyproject.toml (2 hunks)
jobs/async-upload/samples/sample_job_s3_to_oci.yaml (3 hunks)
jobs/async-upload/tests/conftest.py (2 hunks)
jobs/async-upload/tests/integration/README.md (1 hunks)
jobs/async-upload/tests/integration/__init__.py (1 hunks)
jobs/async-upload/tests/integration/test_integration_async_upload.py (1 hunks)
manifests/kustomize/base/model-registry-deployment.yaml (1 hunks)

💤 Files with no reviewable changes (1)

clients/python/.gitignore

🧰 Additional context used

🪛 Ruff (0.12.2)

jobs/async-upload/tests/integration/test_integration_async_upload.py

3-3: json imported but unused

Remove unused import: json

(F401)

9-9: typing.Dict imported but unused

Remove unused import

(F401)

9-9: typing.Any imported but unused

Remove unused import

(F401)

13-13: yaml imported but unused

Remove unused import: yaml

(F401)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

245-245: Local variable mr_host_url is assigned to but never used

Remove assignment to unused variable mr_host_url

(F841)

296-296: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (18)

jobs/async-upload/tests/integration/__init__.py (1)

1-1: Docstring-only module looks fine

No further action required.

jobs/async-upload/.dockerignore (1)

1-2: Good call excluding the runtime PID file

Excluding .port-forwards.pid prevents unnecessary cache invalidation during image builds.

manifests/kustomize/base/model-registry-deployment.yaml (1)

68-71: Verify increased readiness-probe frequency

periodSeconds was reduced from the default 60 s to 20 s.
If /readyz/isDirty performs I/O-heavy checks, a 3× increase in probe rate could create avoidable load spikes or DOS the service under stress.

Consider benchmarking the endpoint latency and, if needed, tune failureThreshold/timeoutSeconds accordingly.

.gitignore (1)

27-29: Consistent ignore rule

Adding .port-forwards.pid at repo root keeps the ignore logic in one place; 👍

Makefile (1)

34-40: LGTM! Well-structured image variable logic.

The conditional logic properly prioritizes explicitly set IMG values while providing sensible fallbacks. This improves flexibility for different deployment scenarios and CI/CD workflows.

jobs/async-upload/pyproject.toml (2)

30-31: LGTM! Proper pytest marker configuration.

The integration marker addition is correctly formatted and aligns with the test configuration changes in conftest.py.

18-20: Dependency versions are up to date and secure

kubernetes: latest 33.1.0 falls within “^33.0.0”

requests: latest 2.32.4 falls within “^2.31.0”

No known vulnerabilities reported for these versions (consider adding a safety check step in CI)

No changes required.

.github/workflows/async-upload-test.yml (1)

20-29: LGTM! Well-organized environment variables.

The separation of async job image variables (JOB_IMG_*) from model registry server variables (IMG_*) provides clear distinction and aligns with the new image management approach.

jobs/async-upload/job/config.py (1)

22-23: LGTM! Proper integration of new URI source type.

The addition of "uri" to the source type choices and the corresponding argument is well-integrated with the existing argument structure.

jobs/async-upload/tests/conftest.py (2)

11-33: LGTM! Well-structured integration test configuration.

The integration test support follows the same pattern as the existing e2e test configuration, providing consistent behavior for:

Selective test execution with --integration

Proper skip markers and reasons

Clean separation between test types

The conditional logic correctly handles all scenarios for running integration, e2e, or default tests.

43-48: LGTM! Consistent command-line option addition.

The --integration option follows the same pattern as the existing --e2e option with appropriate defaults and help text.

jobs/async-upload/tests/integration/README.md (3)

16-16: Documentation inconsistency with implementation

The README states "No external CLI tools required - everything is pure Python!" but the actual test implementation uses subprocess calls to execute kubectl commands.

Likely an incorrect or invalid review comment.

32-39: Update documentation to reflect actual implementation

The "Pure Python Approach" section is misleading. The integration test actually uses:

Subprocess calls to execute kubectl commands

kubectl apply -k which invokes kustomize CLI

Shell commands for job management

Either update the documentation to accurately reflect the implementation or refactor the test to truly use a pure Python approach.

Likely an incorrect or invalid review comment.

73-73: Correct the misleading statement about kustomize

The documentation states "pure Python YAML patching (no kustomize CLI)" but the test actually uses kubectl apply -k which invokes the kustomize CLI.

Likely an incorrect or invalid review comment.

jobs/async-upload/job/download.py (1)

47-75: Well-implemented Hugging Face download function

The function has excellent validation logic and error handling. Good attribution to the kserve source.

jobs/async-upload/Makefile (2)

1-13: Good separation of job and server image variables

The clear distinction between JOB_IMG_* for the async-upload job and IMG for the model registry server improves maintainability.

70-83: Well-structured integration test targets

The new targets follow established patterns and correctly pass the job image URI to the tests.

jobs/async-upload/tests/integration/test_integration_async_upload.py (1)

167-200: Well-implemented job status polling

Good implementation with proper timeout handling and status checking.

coderabbitai · 2025-07-24T13:00:43Z

jobs/async-upload/job/config.py

+def _validate_uri_config(cfg: Dict[str, Any]) -> None:
+    if not ("uri" in cfg and isinstance(cfg["uri"], str)):
+        raise ValueError("uri must be set to a string")
+


🛠️ Refactor suggestion

Enhance URI validation for security.

The current validation only checks if the URI is a string, which is insufficient for security. Consider adding proper URL validation and scheme restrictions.

def _validate_uri_config(cfg: Dict[str, Any]) -> None: + import urllib.parse if not ("uri" in cfg and isinstance(cfg["uri"], str)): raise ValueError("uri must be set to a string") + + # Validate URI format and allowed schemes + try: + parsed = urllib.parse.urlparse(cfg["uri"]) + if not parsed.scheme: + raise ValueError("URI must include a scheme (e.g., https://)") + if parsed.scheme not in ["http", "https", "hf"]: # Add allowed schemes + raise ValueError(f"URI scheme '{parsed.scheme}' is not allowed") + except Exception as e: + raise ValueError(f"Invalid URI format: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _validate_uri_config(cfg: Dict[str, Any]) -> None:

if not ("uri" in cfg and isinstance(cfg["uri"], str)):

raise ValueError("uri must be set to a string")

def _validate_uri_config(cfg: Dict[str, Any]) -> None:

import urllib.parse

if not ("uri" in cfg and isinstance(cfg["uri"], str)):

raise ValueError("uri must be set to a string")

# Validate URI format and allowed schemes

try:

parsed = urllib.parse.urlparse(cfg["uri"])

if not parsed.scheme:

raise ValueError("URI must include a scheme (e.g., https://)")

if parsed.scheme not in ["http", "https", "hf"]: # Add allowed schemes

raise ValueError(f"URI scheme '{parsed.scheme}' is not allowed")

except Exception as e:

raise ValueError(f"Invalid URI format: {e}")

🤖 Prompt for AI Agents

In jobs/async-upload/job/config.py around lines 194 to 197, the _validate_uri_config function only checks if the "uri" key exists and is a string, which is insufficient for security. Enhance the validation by verifying that the URI is a properly formatted URL and restrict allowed schemes (e.g., only allow "http" and "https"). Use a URL parsing library or regex to validate the URI format and raise a ValueError if the URI is invalid or uses a disallowed scheme.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/job/config.py


    # TODO: Maybe clean this up, its a little manual
    # Override with command-line arguments if provided. configargparse will prioritize CLI > ENV
+    cfg["source"]["uri"] = args.source_uri


🛠️ Refactor suggestion

Consider URI sanitization for logging.

The URI is assigned directly without sanitization. If this contains sensitive information (like tokens), it should be sanitized in the logging function.

Update the _sanitize_config_for_logging function to handle URI sanitization:

def _sanitize_config_for_logging(cfg: Dict[str, Any]) -> Dict[str, Any]: """ Create a sanitized copy of the config for logging purposes, masking sensitive values. """ import copy sanitized = copy.deepcopy(cfg) # ... existing sanitization code ... + # Mask sensitive parts of URIs (e.g., tokens in query parameters) + if sanitized["source"].get("uri"): + import urllib.parse + try: + parsed = urllib.parse.urlparse(sanitized["source"]["uri"]) + if parsed.query: + # Mask query parameters that might contain tokens + sanitized["source"]["uri"] = f"{parsed.scheme}://{parsed.netloc}{parsed.path}?***" + except: + sanitized["source"]["uri"] = "***" return sanitized

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cfg["source"]["uri"] = args.source_uri

def _sanitize_config_for_logging(cfg: Dict[str, Any]) -> Dict[str, Any]:

"""

Create a sanitized copy of the config for logging purposes, masking sensitive values.

"""

import copy

sanitized = copy.deepcopy(cfg)

# ... existing sanitization code ...

# Mask sensitive parts of URIs (e.g., tokens in query parameters)

if sanitized["source"].get("uri"):

import urllib.parse

try:

parsed = urllib.parse.urlparse(sanitized["source"]["uri"])

if parsed.query:

# Mask query parameters that might contain tokens

sanitized["source"]["uri"] = f"{parsed.scheme}://{parsed.netloc}{parsed.path}?***"

except:

sanitized["source"]["uri"] = "***"

return sanitized

🤖 Prompt for AI Agents

In jobs/async-upload/job/config.py at line 343, the source URI is assigned directly without sanitization, which risks exposing sensitive information in logs. Update the _sanitize_config_for_logging function to detect and sanitize the URI field by removing or masking sensitive parts such as tokens or credentials before logging. This ensures that any logged configuration data does not leak sensitive information contained in the URI.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/job/download.py

+        uri = config["source"]["uri"]
+        if uri.startswith(HF_URI_PREFIX):
+            download_from_hf(config["source"]["uri"], config["storage"]["path"])


🛠️ Refactor suggestion

⚠️ Potential issue

Remove redundant URI extraction

The URI is already extracted on line 84 but then accessed again from config on line 86.

Apply this diff:

- download_from_hf(config["source"]["uri"], config["storage"]["path"]) + download_from_hf(uri, config["storage"]["path"])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uri = config["source"]["uri"]

if uri.startswith(HF_URI_PREFIX):

download_from_hf(config["source"]["uri"], config["storage"]["path"])

uri = config["source"]["uri"]

if uri.startswith(HF_URI_PREFIX):

download_from_hf(uri, config["storage"]["path"])

🤖 Prompt for AI Agents

In jobs/async-upload/job/download.py around lines 84 to 86, the variable uri is extracted from config["source"]["uri"] on line 84 but then the code redundantly accesses config["source"]["uri"] again on line 86. To fix this, replace config["source"]["uri"] on line 86 with the already extracted uri variable to avoid redundant dictionary access.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/tests/integration/test_integration_async_upload.py

@@ -0,0 +1,365 @@
+"""Integration tests for async-upload job functionality."""
+
+import json


⚠️ Potential issue

Remove unused imports

Static analysis correctly identified unused imports that should be removed.

Apply this diff:

-import json import os import tempfile import time import uuid from pathlib import Path -from typing import Dict, Any import pytest import requests -import yaml

Also applies to: 9-9, 13-13

🧰 Tools

🪛 Ruff (0.12.2)

3-3: json imported but unused

Remove unused import: json

(F401)

🤖 Prompt for AI Agents

In jobs/async-upload/tests/integration/test_integration_async_upload.py at lines 3, 9, and 13, there are unused import statements that should be removed to clean up the code. Identify and delete these unused imports from the specified lines to improve code quality and maintainability.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/tests/integration/test_integration_async_upload.py

+def apply_job_with_strategic_merge(
+    rm_id: str,
+    mv_id: str, 
+    ma_id: str,
+    job_name: str,
+    container_image_uri: str,
+    k8s_client
+) -> str:
+    """Apply job using Kustomize strategic merge patches."""
+    import subprocess
+    import tempfile
+    import time
+
+    # Strategic merge patch template - only patch the image and env vars, keep original job name
+    patch_template = f"""apiVersion: batch/v1
+kind: Job
+metadata:
+  name: my-async-upload-job
+spec:
+  template:
+    spec:
+      containers:
+      - name: async-upload
+        image: {container_image_uri}
+        env:
+        - name: MODEL_SYNC_MODEL_ID
+          value: "{rm_id}"
+        - name: MODEL_SYNC_MODEL_VERSION_ID
+          value: "{mv_id}"
+        - name: MODEL_SYNC_MODEL_ARTIFACT_ID
+          value: "{ma_id}"
+"""
+
+    # Get the path to the sample job file
+    base_job_path = Path(__file__).parent.parent.parent / "samples" / "sample_job_s3_to_oci.yaml"
+
+    # Kustomization template using relative path and modern patches syntax
+    kustomization_template = """apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+
+resources:
+- sample_job_s3_to_oci.yaml
+
+patches:
+- path: patch.yaml
+  target:
+    kind: Job
+    name: my-async-upload-job
+"""
+
+    with tempfile.TemporaryDirectory() as temp_dir:
+        temp_path = Path(temp_dir)
+
+        # Copy the base job file into temp directory
+        import shutil
+        base_job_copy = temp_path / "sample_job_s3_to_oci.yaml"
+        shutil.copy2(base_job_path, base_job_copy)
+
+        # Write the patch file
+        patch_file = temp_path / "patch.yaml"
+        with open(patch_file, "w") as f:
+            f.write(patch_template)
+
+        # Write the kustomization file
+        kustomize_file = temp_path / "kustomization.yaml"  
+        with open(kustomize_file, "w") as f:
+            f.write(kustomization_template)
+
+        # Delete existing job if it exists (Jobs are immutable)
+        delete_result = subprocess.run(
+            ["kubectl", "delete", "job", "my-async-upload-job", "-n", "default", "--ignore-not-found=true"],
+            capture_output=True,
+            text=True,
+            check=False
+        )
+        if delete_result.returncode == 0:
+            print(f"Deleted existing job: {delete_result.stdout.strip()}")
+            # Wait a moment for deletion to complete
+            time.sleep(3)
+
+        # Apply resources using kubectl apply -k
+        result = subprocess.run(
+            ["kubectl", "apply", "-k", "."],
+            capture_output=True,
+            text=True,
+            cwd=temp_path,
+            check=False
+        )
+
+        if result.returncode != 0:
+            raise Exception(f"kubectl apply failed: {result.stderr}")
+
+        # Return the original job name since we're not changing it
+        return "my-async-upload-job"
+


🛠️ Refactor suggestion

Consider using Kubernetes Python client instead of subprocess

This function uses subprocess calls to kubectl, which contradicts the README's claim of a "pure Python approach". Consider refactoring to use the Kubernetes Python client that's already available in the fixtures.

Would you like me to help refactor this to use the Kubernetes Python client for applying resources?

🤖 Prompt for AI Agents

In jobs/async-upload/tests/integration/test_integration_async_upload.py from lines 70 to 164, the function apply_job_with_strategic_merge uses subprocess calls to kubectl for deleting and applying Kubernetes resources, which conflicts with the README's claim of a pure Python approach. Refactor this function to use the Kubernetes Python client instead: load the YAML manifests, patch the Job resource programmatically by modifying the container image and environment variables, delete the existing Job if present using the client API, and create or replace the Job resource via the client. This removes the need for subprocess calls and external kubectl commands, aligning with the pure Python approach.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/tests/integration/test_integration_async_upload.py

+    rm_id: str,
+    mv_id: str, 
+    ma_id: str,
+    job_name: str,


⚠️ Potential issue

Remove unused job_name parameter

The job_name parameter is never used - the function always returns the hardcoded "my-async-upload-job".

Either use the parameter or remove it:

def apply_job_with_strategic_merge( rm_id: str, mv_id: str, ma_id: str, - job_name: str, container_image_uri: str, k8s_client ) -> str:

Also applies to: 163-163

🤖 Prompt for AI Agents

In jobs/async-upload/tests/integration/test_integration_async_upload.py at lines 74 and 163, the job_name parameter is declared but never used since the function returns a hardcoded string. Remove the unused job_name parameter from the function signature to clean up the code and avoid confusion.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/tests/integration/test_integration_async_upload.py

+    """
+
+    # Configuration
+    mr_host_url = os.environ.get("MR_HOST_URL", "http://localhost:8080")


⚠️ Potential issue

Remove unused variable

The mr_host_url variable is assigned but never used.

Remove the unused variable:

- mr_host_url = os.environ.get("MR_HOST_URL", "http://localhost:8080") container_image_uri = os.environ.get(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mr_host_url = os.environ.get("MR_HOST_URL", "http://localhost:8080")

container_image_uri = os.environ.get(

🧰 Tools

🪛 Ruff (0.12.2)

245-245: Local variable mr_host_url is assigned to but never used

Remove assignment to unused variable mr_host_url

(F841)

🤖 Prompt for AI Agents

In jobs/async-upload/tests/integration/test_integration_async_upload.py at line 245, the variable mr_host_url is assigned but never used. Remove the entire line that assigns mr_host_url to clean up the code and avoid unused variable warnings.

coderabbitai · 2025-07-24T13:00:44Z

jobs/async-upload/tests/integration/test_integration_async_upload.py

+        actual_job_name = apply_job_with_strategic_merge(
+            rm_id=rm.id,
+            mv_id=mv.id,
+            ma_id=ma.id,
+            job_name=job_name,
+            container_image_uri=container_image_uri,
+            k8s_client=k8s_client
+        )


🛠️ Refactor suggestion

Fix function call after removing job_name parameter

If you accept the suggestion to remove the unused job_name parameter from apply_job_with_strategic_merge, update this call accordingly.

actual_job_name = apply_job_with_strategic_merge( rm_id=rm.id, mv_id=mv.id, ma_id=ma.id, - job_name=job_name, container_image_uri=container_image_uri, k8s_client=k8s_client )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

actual_job_name = apply_job_with_strategic_merge(

rm_id=rm.id,

mv_id=mv.id,

ma_id=ma.id,

job_name=job_name,

container_image_uri=container_image_uri,

k8s_client=k8s_client

)

actual_job_name = apply_job_with_strategic_merge(

rm_id=rm.id,

mv_id=mv.id,

ma_id=ma.id,

container_image_uri=container_image_uri,

k8s_client=k8s_client

)

🤖 Prompt for AI Agents

In jobs/async-upload/tests/integration/test_integration_async_upload.py around lines 301 to 308, the function call to apply_job_with_strategic_merge still includes the job_name parameter, which has been removed from the function definition. Remove the job_name argument from this call to match the updated function signature and avoid errors.

Signed-off-by: Alex Creasy <[email protected]>

jonburdo and others added 5 commits July 23, 2025 15:31

Add .port-forwards.pid to .gitignore (kubeflow#1370)

c8e680d

This is a temporary file written to in some Makefiles when running clusters locally for testing. Signed-off-by: Jon Burdo <[email protected]>

Merge remote-tracking branch 'upstream/main' into tarilabs-20250724-sync

f9917de

Resolving conflict On .github/workflows/async-upload-test.yml Makefile Signed-off-by: Matteo Mortari <[email protected]>

tarilabs added approved lgtm tide/merge-method-merge labels Jul 24, 2025

openshift-ci bot added the ok-to-test label Jul 24, 2025

openshift-ci bot requested review from Al-Pragliola and rareddy July 24, 2025 12:53

tarilabs mentioned this pull request Jul 24, 2025

[pull] main from kubeflow:main #290

Closed

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

github-actions bot added Area/MR Python client Area/GitHub Area/Manifests labels Jul 24, 2025

openshift-merge-bot bot merged commit f83d358 into opendatahub-io:main Jul 24, 2025
26 of 28 checks passed

tarilabs mentioned this pull request Jul 24, 2025

[pull] main from kubeflow:main #292

Closed

npanpaliya pushed a commit to odh-on-pz/model-registry that referenced this pull request Dec 22, 2025

Add alexcreasy to owners of ui (opendatahub-io#291)

110f301

Signed-off-by: Alex Creasy <[email protected]>

-def _validate_uri_config(cfg: Dict[str, Any]) -> None:
-    if not ("uri" in cfg and isinstance(cfg["uri"], str)):
-        raise ValueError("uri must be set to a string")
+def _validate_uri_config(cfg: Dict[str, Any]) -> None:
+    import urllib.parse
+    if not ("uri" in cfg and isinstance(cfg["uri"], str)):
+        raise ValueError("uri must be set to a string")
+    # Validate URI format and allowed schemes
+    try:
+        parsed = urllib.parse.urlparse(cfg["uri"])
+        if not parsed.scheme:
+            raise ValueError("URI must include a scheme (e.g., https://)")
+        if parsed.scheme not in ["http", "https", "hf"]:  # Add allowed schemes
+            raise ValueError(f"URI scheme '{parsed.scheme}' is not allowed")
+    except Exception as e:
+        raise ValueError(f"Invalid URI format: {e}")

-    cfg["source"]["uri"] = args.source_uri
+def _sanitize_config_for_logging(cfg: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Create a sanitized copy of the config for logging purposes, masking sensitive values.
+    """
+    import copy
+    sanitized = copy.deepcopy(cfg)
+    # ... existing sanitization code ...
+    # Mask sensitive parts of URIs (e.g., tokens in query parameters)
+    if sanitized["source"].get("uri"):
+        import urllib.parse
+        try:
+            parsed = urllib.parse.urlparse(sanitized["source"]["uri"])
+            if parsed.query:
+                # Mask query parameters that might contain tokens
+                sanitized["source"]["uri"] = f"{parsed.scheme}://{parsed.netloc}{parsed.path}?***"
+        except:
+            sanitized["source"]["uri"] = "***"
+    return sanitized

		@@ -0,0 +1,365 @@
		"""Integration tests for async-upload job functionality."""

		import json

	mr_host_url = os.environ.get("MR_HOST_URL", "http://localhost:8080")
	container_image_uri = os.environ.get(

periodic sync upstream KF to midstream ODH #291

periodic sync upstream KF to midstream ODH #291

Uh oh!

Conversation

tarilabs commented Jul 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Jul 24, 2025

Uh oh!

coderabbitai bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

openshift-ci bot commented Jul 24, 2025

Uh oh!

openshift-ci bot commented Jul 24, 2025

Uh oh!

openshift-ci bot commented Jul 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tarilabs commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)