Skip to content

Merge RHDS main to ODH main#74

Closed
hbelmiro wants to merge 49 commits into
opendatahub-io:mainfrom
hbelmiro:merge-rhds-main-to-odh-main
Closed

Merge RHDS main to ODH main#74
hbelmiro wants to merge 49 commits into
opendatahub-io:mainfrom
hbelmiro:merge-rhds-main-to-odh-main

Conversation

@hbelmiro
Copy link
Copy Markdown

@hbelmiro hbelmiro commented May 15, 2026

Description of your changes:

Checklist:

Pre-Submission Checklist

Additional Checklist Items for New or Updated Components/Pipelines

  • metadata.yaml includes fresh lastVerified timestamp
  • All required files
    are present and complete
  • OWNERS file lists appropriate maintainers
  • README provides clear documentation with usage examples
  • Component follows snake_case naming convention
  • No security vulnerabilities in dependencies
  • Containerfile included if using a custom base image

Summary by CodeRabbit

  • New Features

    • Added end-to-end AutoGluon tabular training pipeline with model selection and evaluation metrics generation.
    • Added AutoGluon timeseries training pipeline with temporal data splitting and per-series forecasting.
  • Documentation

    • Updated Tekton workflow instructions to clarify that pipeline updates should be performed in the central repository.
  • Chores

    • Added automated build pipeline configurations for pull request validation.
    • Configured Renovate for dependency management automation.
    • Updated project dependencies.

rhods-devops-app Bot and others added 30 commits April 16, 2026 15:12
Signed-off-by: Mateusz Switala <mswitala@redhat.com>
Assisted-by: Claude Code
Signed-off-by: Mateusz Switala <mswitala@redhat.com>
…_pipelines

chore: regenerate autox pipeline yaml files
…to 6aeddf1 (kubeflow#7)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…to 2b34df3 (kubeflow#8)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…er digest to e42e1d0 (kubeflow#10)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…to 0cac2a0 (kubeflow#13)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…to ad95b39 (kubeflow#17)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…to ba0eda7 (kubeflow#20)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…to a81d593 (kubeflow#22)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…to f6e9796 (kubeflow#26)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Daniel Ryszka <dryszka@redhat.com>
Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
…-pipeline-yaml-changes

chore: Cherry pick autox pipeline yaml changes
…to 82857f4 (kubeflow#31)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
moulalis and others added 19 commits May 7, 2026 12:40
…to c0be006 (kubeflow#36)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…ENG-59984

Signed-off-by: Nelesh Singla <117123879+nsingla@users.noreply.github.com>
Signed-off-by: Nelesh Singla <117123879+nsingla@users.noreply.github.com>
…to 239d0bd (kubeflow#44)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
…luded (kubeflow#41)

* recompiled autorag pipeline so that recent bug fixes are included

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

* recompiled autorag pipeline once again to include even newest changes

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

---------

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>
…to de8a313 (kubeflow#47)

Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
ci(cherry pick): RHOAIENG-59984 to main
Signed-off-by: Dorota Laczak <dlaczak@redhat.com>
…rhds-main-to-odh-main

# Conflicts:
#	.github/workflows/sync-requirements.yml
#	Dockerfile.konflux.pipelines-components
#	requirements-build.txt
#	requirements.txt
…dated hashes from `uv export`

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR introduces two complete AutoGluon-based ML training pipelines as KFP v2 YAML definitions: one for tabular data and one for timeseries data. Both pipelines include embedded Python executor logic for S3 data loading (with sampling and temporal/stratified splitting), model training/selection via AutoGluon, per-model metric computation with NaN/Inf filtering, and HTML leaderboard generation. Supporting changes add two Konflux Dockerfiles for containerization, three Tekton PipelineRun manifests for PR-triggered multi-arch builds, Renovate configuration, documentation updates, YAML linting rules, and a refreshed requirements.txt via uv export with updated pinned versions and PyPy environment markers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes


Security findings

CWE-295 (Improper Certificate Validation): Dockerfile.konflux.autorag line 14 and pipelines/training/automl/autogluon_tabular_training_pipeline/pipeline.yaml line 524 both disable SSL verification during S3 downloads (verify=False, verify: false in SSL context). While annotated as SSLError retry fallback, this widens the window for MITM attacks. Require explicit allowlist of trusted S3 endpoints or conditional verification based on environment.

CWE-94 (Improper Control of Generation of Code): Embedded Python in autogluon_models_training (lines 151–393) and timeseries variants execute model inference and hyperparameter access via user-supplied top_n and task_type without strict type or range bounds. A malicious task_type string could trigger unexpected AutoGluon code paths. Validate against a whitelist of known task types (e.g., {'binary', 'multiclass', 'regression'}).

CWE-434 (Unrestricted Upload of Dangerous File Type): Tekton manifests wire git-auth workspace secret references but do not validate Dockerfile paths. Dockerfile.konflux.autorag line 19 executes build_sqlite.sh from a repository path without integrity checks. Verify script hash or move to container image pre-build step.

CWE-200 (Exposure of Sensitive Information): requirements.txt now spans 369 lines with full --hash= entries for all packages, making package use patterns visible to any repo reader. If sensitive ML dependencies or internal tooling exists, consider publishing a minimal requirements.lock separately or using private indexes exclusively.


Code quality issues

  • Duplicate Tekton configuration: Three nearly-identical PipelineRun manifests differ only in namespace, Dockerfile path, and service account naming. Extract a common template or Kustomize base to reduce maintenance burden.

  • Magic numbers without explanation: autogluon_timeseries_models_selection (line 550) uses default_selection_train_size=0.3 and autogluon_tabular_training_pipeline assumes workspace byte limit without documenting rationale. Add inline constants with justification.

  • Inconsistent error handling: Data loaders handle S3 SSLError but not NoCredentialsError, PartialCredentialsError, or EndpointConnectionError. Add defensive bounds checking for CSV row counts before memory-intensive operations.

  • Leaderboard template injection: leaderboard_evaluation components (lines 693–804, 786–916) use string-based HTML template with {metric_name} placeholders. No escaping of model names or metric values. Use a templating library or explicit HTML entity encoding to prevent XSS if leaderboard is served.

  • Hardcoded artifact paths: Both pipelines hardcode /tmp/ and workspace paths. Use environment variables or parameterization to allow flexibility across different storage backends.

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Merge RHDS main to ODH main' is vague and does not clearly convey the specific technical changes being introduced. Provide a more descriptive title summarizing key changes, e.g., 'feat(pipelines): Add AutoML/AutoRAG training pipelines and Konflux integration' or follow the repository's title convention with a conventional prefix (feat/fix/chore).
Description check ⚠️ Warning The PR description is entirely empty except for the template structure; the 'Description of your changes' section contains no actual details about what was added, modified, or why. Complete the description section with a clear summary of changes, including the addition of two new training pipelines, Tekton/Konflux configuration files, updated Dockerfiles, and dependency updates, and explain the rationale for the merge.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Dockerfile.konflux.automl (1)

1-22: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Verify non-root user in final image.

Dockerfile does not explicitly set USER. While the base image may run as non-root, best practice is to explicitly set USER before the final layer to ensure the container does not run as root (CWE-250). As per coding guidelines, Dockerfiles should run as non-root user.

🛡️ Proposed fix to explicitly set USER
 RUN pip install --index-url https://pypi.org/simple --no-cache-dir -r /tmp/requirements-pypi.txt
+
+USER 1001

 LABEL com.redhat.component="odh-automl-container" \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.konflux.automl` around lines 1 - 22, The Dockerfile currently
doesn't set a non-root runtime user which can cause the final container to run
as root; update the Dockerfile (after the final RUN/PACKAGE/LABEL steps) to add
an explicit non-root USER, e.g., create or reference a non-root user/UID
available in the base image and add a USER instruction (ensure the chosen
username/UID exists or create it earlier), so the final image runs as that
non-root account instead of root.
🧹 Nitpick comments (6)
.tekton/odh-pipelines-components-pull-request.yaml (2)

40-45: ⚡ Quick win

Inconsistent JSON formatting.

This file uses compact single-line JSON for prefetch-input, while .tekton/odh-automl-pull-request.yaml (lines 52-63) and .tekton/odh-autorag-on-pull-request.yaml (lines 50-74) use multi-line array format. Inconsistent formatting reduces maintainability.

Standardize on the expanded multi-line array format used in the other two files for better readability and easier validation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/odh-pipelines-components-pull-request.yaml around lines 40 - 45, The
prefetch-input entry uses compact single-line JSON; change its value to the
expanded multi-line array/object format to match the other manifests: locate the
prefetch-input block and replace the single-line JSON string with a formatted
object/array style (split "type", "path", "requirements_files",
"requirements_build_files", and "binary" onto separate lines) following the same
structure used in odh-automl-pull-request.yaml and
odh-autorag-on-pull-request.yaml so the keys (prefetch-input -> value) and
fields ("type", "path", "requirements_files", "requirements_build_files",
"binary") remain identical but are multiline and easier to read.

17-17: ⚡ Quick win

Inconsistent PipelineRun naming pattern.

This PipelineRun uses a dynamic name including {{pull_request_number}}, while the other two PR pipelines (odh-automl and odh-autorag) use static names. Inconsistent naming makes the three pipelines harder to manage and understand.

Standardize on either static names (if Tekton/Pipelines as Code handles uniqueness) or dynamic names (if uniqueness is required) across all three files.

♻️ Proposed fix for consistency
-  name: odh-pipelines-components-on-pull-request-{{pull_request_number}}
+  name: odh-pipelines-components-on-pull-request
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/odh-pipelines-components-pull-request.yaml at line 17, The
PipelineRun name
odh-pipelines-components-on-pull-request-{{pull_request_number}} is inconsistent
with the other PR pipelines (odh-automl and odh-autorag); pick one naming
convention and make this file match it: either replace the dynamic template name
with the static name used by the other two (e.g.,
odh-pipelines-components-on-pull-request) if Tekton/Pipelines-as-Code guarantees
uniqueness, or update the other two pipeline files to include the
{{pull_request_number}} template so all three use dynamic names—ensure you
update the name field consistently wherever
odh-pipelines-components-on-pull-request-{{pull_request_number}}, odh-automl, or
odh-autorag appear.
Dockerfile.konflux.autorag (2)

34-38: ⚡ Quick win

Complex conditional RUN reduces debuggability.

Multi-line shell conditional in RUN makes build failures harder to diagnose. If either branch of the Docling seeding fails, the error context will be limited.

♻️ Split into separate RUN commands or use a wrapper script

Option 1: Wrapper script for clarity

COPY seed_docling_wrapper.sh /tmp/
RUN bash /tmp/seed_docling_wrapper.sh

Option 2: Explicit RUN steps (better layer caching)

RUN test -d "${HERMETO_GENERIC_DEPS}" && test -n "$(ls -A "${HERMETO_GENERIC_DEPS}" 2>/dev/null)" && \
    python3 /tmp/seed_docling_models.py --dest /opt/app-root/docling-artifacts/models --hermeto-dir "${HERMETO_GENERIC_DEPS}" || \
    python3 /tmp/seed_docling_models.py --dest /opt/app-root/docling-artifacts/models --download
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.konflux.autorag` around lines 34 - 38, The multi-branch shell
conditional in the Dockerfile RUN makes build failures hard to debug; update the
Dockerfile to replace the single complex RUN invoking
/tmp/seed_docling_models.py with clearer steps: either add a small wrapper
script (e.g., /tmp/seed_docling_wrapper.sh) that checks the HERMETO_GENERIC_DEPS
directory and runs seed_docling_models.py with either --hermeto-dir or
--download while emitting explicit logs and exiting non‑zero on error, or split
into separate RUN commands that first test the directory presence (using
HERMETO_GENERIC_DEPS and ls -A) and then run the appropriate python3
/tmp/seed_docling_models.py invocation so failures show which branch failed and
improve layer caching and debuggability.

3-3: 💤 Low value

Remove commented-out code.

Commented-out base image should be removed before merge to avoid confusion about which base is current.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.konflux.autorag` at line 3, The Dockerfile contains a
commented-out ARG for BASE_IMAGE ("#ARG
BASE_IMAGE=\"quay.io/aipcc/base-images/cpu:3.3\"") which should be removed;
delete that commented line so only the active base image/config remains in
Dockerfile.konflux.autorag (ensure no leftover commented ARGs remain to avoid
confusion).
.tekton/odh-automl-pull-request.yaml (1)

52-63: ⚡ Quick win

Inconsistent JSON formatting in prefetch-input.

This file uses a JSON array with explicit formatting (one object per line with brackets), while .tekton/odh-pipelines-components-pull-request.yaml (line 40-45) uses a compact single-line JSON string. Inconsistent formatting makes the codebase harder to maintain.

Standardize on one format across all three PipelineRun manifests. The expanded format used here is more readable and easier to validate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/odh-automl-pull-request.yaml around lines 52 - 63, The
prefetch-input JSON for the PipelineRun is formatted as a multi-line array but
other manifests use a compact single-line string; update the prefetch-input
"value" for this PipelineRun to match the expanded, multi-line JSON style used
across the repo (i.e., keep the array/object structure with one object per line
and explicit keys) so it is consistent with the more readable format; locate the
"prefetch-input" entry and replace the compact/string variant (the "value"
field) with the expanded JSON array containing the pip object, path
"pipelines/training/automl", requirements_files including
"autogluon_tabular_training_pipeline/requirements.txt", and the binary arch
entry to mirror the other PipelineRun manifests.
pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml (1)

1006-1013: 💤 Low value

parallelismLimit: 1 serializes all top-N refits.

The refit loop is forced sequential. For top_n=3 with DEFAULT_TIME_LIMIT=600s selection plus N refits, wall-clock grows linearly. If this is intentional resource pacing on a shared PVC (ReadWriteOnce), call it out in a comment; otherwise raise to the cluster's concurrency budget — refit jobs write to isolated output artifacts and don't contend on the predictor path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml`
around lines 1006 - 1013, The pipeline's iteratorPolicy currently sets
parallelismLimit: 1 which forces the refit loop (taskInfo name: for-loop-1,
parameterIterator using inputParameter
pipelinechannel--autogluon-timeseries-models-selection-top_models and itemInput
pipelinechannel--autogluon-timeseries-models-selection-top_models-loop-item) to
run serially; either raise parallelismLimit to the cluster's allowed concurrency
to run top-N refits in parallel (since refits write isolated artifacts) or add
an explicit comment above iteratorPolicy documenting that parallelismLimit: 1 is
intentional due to shared PVC/ReadWriteOnce constraints and should not be
changed. Ensure you update the iteratorPolicy block accordingly and reference
the same parameterIterator and taskInfo names so reviewers can verify the
intention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.tekton/odh-autorag-on-pull-request.yaml:
- Around line 75-76: The CI step currently sets prefetch-log-level to "debug",
which can leak sensitive data in logs; update the configuration by either
removing the prefetch-log-level entry or changing its value from "debug" to
"info" (or a less verbose level) in the YAML so that the key prefetch-log-level
no longer enables debug logging in the pipeline.

In @.tekton/README.md:
- Around line 14-15: The README currently instructs contributors to push
directly to main (see the commands "git clone
git@github.com:red-hat-data-services/konflux-central.git" and "cd
konflux-central" and the later push-to-main steps); change the workflow to a
PR-based flow by instructing contributors to create a feature branch, push that
branch to their fork (e.g., git checkout -b my-feature; git push origin
my-feature), and then open a Pull Request against main in
red-hat-data-services/konflux-central instead of pushing directly to main; apply
the same update to the later occurrence of the push-to-main instructions (lines
referenced in the comment).

In `@Dockerfile.konflux.automl`:
- Around line 10-11: The two RUN pip install lines install packages from
different indexes sequentially (the first RUN using /tmp/requirements.txt with
the RHAI index and the second RUN using /tmp/requirements-pypi.txt with PyPI)
which can lead to dependency version conflicts because the second installation
can override packages from the first; to fix, consolidate dependency
installation so pip resolves both sources together (e.g., merge into a single
requirements file or use a single pip invocation that specifies package-specific
--index-url entries), or if packages are truly isolated, install the second set
with --no-deps to prevent overriding transitive dependencies—update the RUN pip
install commands and the /tmp/requirements*.txt files accordingly to ensure
deterministic resolution.

In `@Dockerfile.konflux.autorag`:
- Around line 22-23: The Dockerfile switches to USER root to run
/tmp/install_sqlite_from_source.sh with HERMETO_GENERIC_DEPS which grants
excessive privileges; refactor by moving the SQLite build into a separate
multi-stage builder image run as a non-root user (or pre-build and include
SQLite in the base image) so the install script is never executed as root in the
final image, and if running as root cannot be avoided, minimize and audit
/tmp/install_sqlite_from_source.sh and add cryptographic verification of any
downloaded artifacts before executing them.
- Around line 26-29: The three sequential pip install lines (the RUN pip install
... for /tmp/requirements.txt, /tmp/requirements-pypi.txt, and
/tmp/requirements-pypi-whl.txt) can cause dependency resolution conflicts;
consolidate dependency resolution into a single install step by producing one
locked requirements file (e.g., via pip-compile or pip-tools) that resolves
versions across all sources and/or annotates sources with --extra-index-url or
direct PEP 508 URLs, then replace the three RUN pip install lines with a single
RUN pip install --no-cache-dir -r /tmp/requirements-locked.txt so all packages
are resolved together and build isolation is ensured.

In `@pipelines/training/automl/autogluon_tabular_training_pipeline/pipeline.yaml`:
- Around line 537-599: The three helper functions (_sample_first_n_rows,
_sample_stratified, _sample_random) currently catch Exception and silently
return partial data; narrow the except handling to only known recoverable
I/O/parser errors (e.g., botocore.exceptions.ClientError, OSError,
pd.errors.ParserError) and handle them explicitly, but do not swallow ValueError
raised for preconditions (move the label_column existence check in
_sample_stratified outside the broad try or re-raise ValueError immediately);
also ensure that if an exception happens after some chunks have been accumulated
you either re-raise the exception (so callers know the read was truncated) or
only allow a logged, intentional fallback when the failure is a recoverable
first-chunk I/O/parser error—use logger.debug/logger.error with the exception
and keep the final return behavior (pd.DataFrame or subsampled_data) unchanged
otherwise.
- Around line 311-317: The fixed clone directory work_path =
predictor_path.parent / "refit_work" can collide across concurrent runs and
cause corrupted predictor state; change work_path to a unique per-run temp
directory (e.g., include run_id or use tempfile.mkdtemp with
dir=predictor_path.parent) before calling predictor.clone(path=work_path, ...),
so predictor_clone.refit_full operates on an isolated clone; ensure cleanup of
the temp dir after refit.
- Around line 791-794: The leaderboard fails if some per-model rows in results
lack the eval_metric; modify the code that constructs leaderboard_df so it
ensures the eval_metric column exists for every row (e.g., when creating
leaderboard_df from results, coerce missing metric keys to NaN or a sentinel:
use pd.DataFrame(results).assign(**{eval_metric:
pd.Series(pd.DataFrame(results).get(eval_metric))}) or explicitly fill missing
with NaN), then call sort_values(by=eval_metric, ascending=False,
na_position='last') so missing metrics don't raise KeyError and are placed at
the bottom; keep the rest of the flow (leaderboard_df.index and
_build_leaderboard_table) unchanged.
- Around line 326-366: The code runs _process_model concurrently using
ThreadPoolExecutor on a shared predictor_clone (used by predictor_clone.predict,
predictor_clone.evaluate_predictions, predictor_clone.feature_importance), which
is not thread-safe; replace ThreadPoolExecutor with ProcessPoolExecutor and
ensure each worker gets its own TabularPredictor instance (e.g., clone or reload
the predictor inside the worker function _process_model), or alternatively
serialize access to predictor_clone with a threading.Lock around those calls;
also replace the silent dict(f.result() for f in futures) collection with
concurrent.futures.as_completed() (or wait()) and collect results while
explicitly re-raising/logging exceptions so worker errors are surfaced instead
of being suppressed when assembling eval_results_by_model.
- Around line 607-613: The current SSLError handler in automl_data_loader
silently falls back to get_s3_client(verify=False); instead, change the except
SSLError block to check an explicit environment flag (e.g.
AWS_S3_INSECURE_SKIP_VERIFY) before disabling TLS: if the flag is not set,
re-raise or fail the download with a clear error; if the flag is set, log a
high-severity warning that TLS verification is being intentionally disabled
(including the env var name) before calling get_s3_client(verify=False). Update
references in the automl_data_loader error handling and the get_s3_client call
site so the insecure path is only taken when the env var is present and
documented.
- Around line 651-653: The sampled_test_dataset.uri is being reassigned after
Artifact.path has already been derived, causing path/URI divergence; update the
pipeline so you compute and set sampled_test_dataset.uri (including the ".csv"
suffix) before any access to sampled_test_dataset.path or other artifact fields,
ensuring the CSV write uses the final URI, and verify downstream reference names
like sampled_test_dataset.path and the consumer component
comp-autogluon-models-training receive the corrected URI.

In
`@pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml`:
- Around line 679-686: timeseries_data_loader.load_timeseries_data_truncate
contains an implicit SSLError fallback that retries with
get_s3_client(verify=False) (introducing CWE-295); remove the unconditional
verify=False retry and instead gate any insecure fallback behind a clearly named
explicit environment variable (e.g. S3_ALLOW_INSECURE_VERIFY) checked before
creating a no-verify client; additionally factor the S3 client creation into a
shared utility (replace inline get_s3_client usage with a single shared factory)
so the retry logic is not duplicated, and ensure any insecure branch logs the
env var name and error details when it is used.
- Around line 893-895: The leaderboard sort currently fails when some aggregated
metrics lack the eval_metric key: before calling
pd.DataFrame(results).sort_values(by=eval_metric, ascending=False) validate that
every entry in results (or every column in the created DataFrame) contains
eval_metric; filter out entries missing eval_metric with a warning (or raise a
clear error) and only then build leaderboard_df and set its index. Update the
code path that prepares results/metrics.json aggregation and the leaderboard_df
creation to perform this presence check and logging so sort_values never
receives a missing key.
- Around line 720-756: The per-series split can produce empty test_df or
extra_train_df (because _early_late_split keeps single-row series in train and
can move rows to selection), so add explicit non-empty validations after
computing test_df and extra_train_df (alongside the existing checks for train_df
and selection_train_df): if len(test_df) == 0 raise a ValueError explaining that
the temporal split produced no test rows (suggest increasing rows per series or
increasing test_size/adjusting DEFAULT_TEST_SIZE), and if len(extra_train_df) ==
0 raise a ValueError explaining that the secondary split produced no extra-train
rows (suggest increasing rows per series, reducing selection_train_size, or
reducing test_size), referencing the variables/functions _early_late_split,
train_df, test_df, extra_train_df, selection_train_df, DEFAULT_TEST_SIZE, and
selection_train_size so the checks are placed just before test_df.to_csv(...)
and before refit/data-loading steps.
- Around line 356-360: The code currently inspects predictor._trainer via
is_ensemble_model to detect ensemble models; replace this private access with
the public predictor.info() API: inside is_ensemble_model (or its inline usage)
call predictor.info(), extract info["model_info"][model_name]["model_type"] (or,
as a fallback, test model_name prefixes like
"WeightedEnsemble"/"GreedyEnsemble") and then use issubclass(...,
AbstractTimeSeriesEnsembleModel) to set is_ensemble; update references to
predictor._trainer.get_model_attribute(...) to use the info() lookup instead.

In `@requirements.txt`:
- Around line 1-4: Regenerate the autogenerated lockfile so it matches CI output
by running the project workflow that produces requirements.txt (use the
documented command: run the generator via the project's make target, e.g., "make
requirements" which internally runs "uv export --frozen --no-dev --no-editable
--no-annotate"), then update requirements.txt with the exact produced output and
ensure you remove the "." self-reference line as the project expects; commit the
regenerated file so CI and the checked-in requirements.txt are in sync.
- Around line 364-366: requirements.txt pins urllib3 to 2.6.3 which is
vulnerable; update the urllib3 entry (symbol: urllib3) to at least 2.7.0 and
then regenerate the lockfile by running the project's dependency build command
(e.g., run "make requirements" or the repo's equivalent) so the hashes and lock
entries are updated to reflect urllib3>=2.7.0 and resolve the two security
advisories.

---

Outside diff comments:
In `@Dockerfile.konflux.automl`:
- Around line 1-22: The Dockerfile currently doesn't set a non-root runtime user
which can cause the final container to run as root; update the Dockerfile (after
the final RUN/PACKAGE/LABEL steps) to add an explicit non-root USER, e.g.,
create or reference a non-root user/UID available in the base image and add a
USER instruction (ensure the chosen username/UID exists or create it earlier),
so the final image runs as that non-root account instead of root.

---

Nitpick comments:
In @.tekton/odh-automl-pull-request.yaml:
- Around line 52-63: The prefetch-input JSON for the PipelineRun is formatted as
a multi-line array but other manifests use a compact single-line string; update
the prefetch-input "value" for this PipelineRun to match the expanded,
multi-line JSON style used across the repo (i.e., keep the array/object
structure with one object per line and explicit keys) so it is consistent with
the more readable format; locate the "prefetch-input" entry and replace the
compact/string variant (the "value" field) with the expanded JSON array
containing the pip object, path "pipelines/training/automl", requirements_files
including "autogluon_tabular_training_pipeline/requirements.txt", and the binary
arch entry to mirror the other PipelineRun manifests.

In @.tekton/odh-pipelines-components-pull-request.yaml:
- Around line 40-45: The prefetch-input entry uses compact single-line JSON;
change its value to the expanded multi-line array/object format to match the
other manifests: locate the prefetch-input block and replace the single-line
JSON string with a formatted object/array style (split "type", "path",
"requirements_files", "requirements_build_files", and "binary" onto separate
lines) following the same structure used in odh-automl-pull-request.yaml and
odh-autorag-on-pull-request.yaml so the keys (prefetch-input -> value) and
fields ("type", "path", "requirements_files", "requirements_build_files",
"binary") remain identical but are multiline and easier to read.
- Line 17: The PipelineRun name
odh-pipelines-components-on-pull-request-{{pull_request_number}} is inconsistent
with the other PR pipelines (odh-automl and odh-autorag); pick one naming
convention and make this file match it: either replace the dynamic template name
with the static name used by the other two (e.g.,
odh-pipelines-components-on-pull-request) if Tekton/Pipelines-as-Code guarantees
uniqueness, or update the other two pipeline files to include the
{{pull_request_number}} template so all three use dynamic names—ensure you
update the name field consistently wherever
odh-pipelines-components-on-pull-request-{{pull_request_number}}, odh-automl, or
odh-autorag appear.

In `@Dockerfile.konflux.autorag`:
- Around line 34-38: The multi-branch shell conditional in the Dockerfile RUN
makes build failures hard to debug; update the Dockerfile to replace the single
complex RUN invoking /tmp/seed_docling_models.py with clearer steps: either add
a small wrapper script (e.g., /tmp/seed_docling_wrapper.sh) that checks the
HERMETO_GENERIC_DEPS directory and runs seed_docling_models.py with either
--hermeto-dir or --download while emitting explicit logs and exiting non‑zero on
error, or split into separate RUN commands that first test the directory
presence (using HERMETO_GENERIC_DEPS and ls -A) and then run the appropriate
python3 /tmp/seed_docling_models.py invocation so failures show which branch
failed and improve layer caching and debuggability.
- Line 3: The Dockerfile contains a commented-out ARG for BASE_IMAGE ("#ARG
BASE_IMAGE=\"quay.io/aipcc/base-images/cpu:3.3\"") which should be removed;
delete that commented line so only the active base image/config remains in
Dockerfile.konflux.autorag (ensure no leftover commented ARGs remain to avoid
confusion).

In
`@pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml`:
- Around line 1006-1013: The pipeline's iteratorPolicy currently sets
parallelismLimit: 1 which forces the refit loop (taskInfo name: for-loop-1,
parameterIterator using inputParameter
pipelinechannel--autogluon-timeseries-models-selection-top_models and itemInput
pipelinechannel--autogluon-timeseries-models-selection-top_models-loop-item) to
run serially; either raise parallelismLimit to the cluster's allowed concurrency
to run top-N refits in parallel (since refits write isolated artifacts) or add
an explicit comment above iteratorPolicy documenting that parallelismLimit: 1 is
intentional due to shared PVC/ReadWriteOnce constraints and should not be
changed. Ensure you update the iteratorPolicy block accordingly and reference
the same parameterIterator and taskInfo names so reviewers can verify the
intention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 18b4557e-3ae1-482a-9df1-876d2efe9de9

📥 Commits

Reviewing files that changed from the base of the PR and between 66be6f9 and 47dc4d9.

📒 Files selected for processing (12)
  • .github/renovate.json
  • .tekton/README.md
  • .tekton/odh-automl-pull-request.yaml
  • .tekton/odh-autorag-on-pull-request.yaml
  • .tekton/odh-pipelines-components-pull-request.yaml
  • .yamllint.yml
  • Dockerfile.konflux.automl
  • Dockerfile.konflux.autorag
  • pipelines/training/automl/autogluon_tabular_training_pipeline/pipeline.yaml
  • pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml
  • pipelines/training/autorag/documents_rag_optimization_pipeline/pipeline.yaml
  • requirements.txt

Comment on lines +75 to +76
- name: prefetch-log-level
value: "debug"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Debug logging may leak sensitive information.

Setting prefetch-log-level: "debug" enables verbose logging that may expose internal URLs, authentication tokens, proprietary package names, or build secrets in CI logs. If this is for troubleshooting, it should be removed before production use.

Remove or set to "info" unless debug logging is explicitly required and CI logs are properly secured.

🔒 Proposed fix
-  - name: prefetch-log-level
-    value: "debug"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: prefetch-log-level
value: "debug"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/odh-autorag-on-pull-request.yaml around lines 75 - 76, The CI step
currently sets prefetch-log-level to "debug", which can leak sensitive data in
logs; update the configuration by either removing the prefetch-log-level entry
or changing its value from "debug" to "info" (or a less verbose level) in the
YAML so that the key prefetch-log-level no longer enables debug logging in the
pipeline.

Comment thread .tekton/README.md
Comment on lines +14 to 15
git clone git@github.com:red-hat-data-services/konflux-central.git
cd konflux-central
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace direct-to-main instructions with PR-based flow.

Current instructions tell contributors to push straight to main, which weakens change control and can bypass required reviews/checks if branch protection is permissive. This is a supply-chain/process security risk (CWE-285: Improper Authorization at workflow level).

Suggested doc fix
-git checkout main
+git checkout -b <your-branch-name>

 ...

-git push origin main
+git push origin <your-branch-name>

Add a final step instructing users to open a PR against main in red-hat-data-services/konflux-central.

As per coding guidelines, "REVIEW PRIORITIES: 1. Security vulnerabilities ... 2. Architectural issues and anti-patterns".

Also applies to: 35-37

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/README.md around lines 14 - 15, The README currently instructs
contributors to push directly to main (see the commands "git clone
git@github.com:red-hat-data-services/konflux-central.git" and "cd
konflux-central" and the later push-to-main steps); change the workflow to a
PR-based flow by instructing contributors to create a feature branch, push that
branch to their fork (e.g., git checkout -b my-feature; git push origin
my-feature), and then open a Pull Request against main in
red-hat-data-services/konflux-central instead of pushing directly to main; apply
the same update to the later occurrence of the push-to-main instructions (lines
referenced in the comment).

Comment thread Dockerfile.konflux.automl
Comment on lines +10 to +11
RUN pip install --index-url https://console.redhat.com/api/pypi/public-rhai/rhoai/3.4/cpu-ubi9/simple/ --no-cache-dir -r /tmp/requirements.txt
RUN pip install --index-url https://pypi.org/simple --no-cache-dir -r /tmp/requirements-pypi.txt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Potential dependency conflict from sequential index installs.

Installing from two different package indexes sequentially (RHAI then PyPI) can cause dependency resolution conflicts if the same package exists in both indexes with different versions. The second install may override packages from the first, breaking the "build isolation" mentioned in line 9.

Consider consolidating requirements or using pip install with --no-deps for the second index if packages are truly isolated, or use a single requirements file with explicit --index-url per package.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.konflux.automl` around lines 10 - 11, The two RUN pip install
lines install packages from different indexes sequentially (the first RUN using
/tmp/requirements.txt with the RHAI index and the second RUN using
/tmp/requirements-pypi.txt with PyPI) which can lead to dependency version
conflicts because the second installation can override packages from the first;
to fix, consolidate dependency installation so pip resolves both sources
together (e.g., merge into a single requirements file or use a single pip
invocation that specifies package-specific --index-url entries), or if packages
are truly isolated, install the second set with --no-deps to prevent overriding
transitive dependencies—update the RUN pip install commands and the
/tmp/requirements*.txt files accordingly to ensure deterministic resolution.

Comment on lines +22 to +23
USER root
RUN HERMETO_GENERIC_DEPS="${HERMETO_GENERIC_DEPS}" bash /tmp/install_sqlite_from_source.sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Arbitrary script execution as root (CWE-250).

Switching to USER root to execute /tmp/install_sqlite_from_source.sh grants unnecessary privileges. If the script or any dependency it downloads is compromised, the entire build environment and potentially the base image layers can be compromised. This violates the principle of least privilege.

Refactor to:

  1. Build SQLite in a separate builder stage as non-root, or
  2. Pre-build SQLite and package it in the base image, or
  3. If root is absolutely required, audit and minimize the script's surface area and consider cryptographic verification of downloaded artifacts.

As per coding guidelines, Dockerfiles should run as non-root user.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.konflux.autorag` around lines 22 - 23, The Dockerfile switches to
USER root to run /tmp/install_sqlite_from_source.sh with HERMETO_GENERIC_DEPS
which grants excessive privileges; refactor by moving the SQLite build into a
separate multi-stage builder image run as a non-root user (or pre-build and
include SQLite in the base image) so the install script is never executed as
root in the final image, and if running as root cannot be avoided, minimize and
audit /tmp/install_sqlite_from_source.sh and add cryptographic verification of
any downloaded artifacts before executing them.

Comment on lines +26 to +29
## TODO: Ensure build isolation
RUN pip install --index-url https://console.redhat.com/api/pypi/public-rhai/rhoai/3.4/cpu-ubi9/simple/ --no-cache-dir -r /tmp/requirements.txt
RUN pip install --index-url https://pypi.org/simple --no-cache-dir -r /tmp/requirements-pypi.txt
RUN pip install --index-url https://pypi.org/simple --no-cache-dir -r /tmp/requirements-pypi-whl.txt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Three sequential pip installs risk dependency conflicts.

Installing from three different package indexes sequentially (RHAI → PyPI → PyPI wheel) increases the risk of version conflicts and dependency resolution failures. The TODO comment on line 26 acknowledges that build isolation is not ensured.

Consolidate into a single requirements file with explicit source annotations, or use a lock file (e.g., pip-compile output) that resolves all dependencies together.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.konflux.autorag` around lines 26 - 29, The three sequential pip
install lines (the RUN pip install ... for /tmp/requirements.txt,
/tmp/requirements-pypi.txt, and /tmp/requirements-pypi-whl.txt) can cause
dependency resolution conflicts; consolidate dependency resolution into a single
install step by producing one locked requirements file (e.g., via pip-compile or
pip-tools) that resolves versions across all sources and/or annotates sources
with --extra-index-url or direct PEP 508 URLs, then replace the three RUN pip
install lines with a single RUN pip install --no-cache-dir -r
/tmp/requirements-locked.txt so all packages are resolved together and build
isolation is ensured.

Comment on lines +679 to +686
\n from botocore.exceptions import SSLError\n\n s3_client\
\ = get_s3_client()\n try:\n response = s3_client.get_object(Bucket=bucket_name,\
\ Key=file_key)\n except SSLError:\n logger.warning(\n\
\ \"SSL error when downloading s3://%s/%s, retrying with\
\ verify=False\",\n bucket_name,\n file_key,\n\
\ )\n no_verify_client = get_s3_client(verify=False)\n\
\ response = no_verify_client.get_object(Bucket=bucket_name,\
\ Key=file_key)\n text_stream = io.TextIOWrapper(response[\"Body\"\
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Same SSL verify=False fallback as tabular pipeline — CWE-295.

timeseries_data_loader.load_timeseries_data_truncate reproduces the implicit verify=False retry on SSLError. Same MITM exposure; apply the same fix (gate behind an explicit env var) and ideally factor the S3 client out into a shared utility so this isn't duplicated.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml`
around lines 679 - 686, timeseries_data_loader.load_timeseries_data_truncate
contains an implicit SSLError fallback that retries with
get_s3_client(verify=False) (introducing CWE-295); remove the unconditional
verify=False retry and instead gate any insecure fallback behind a clearly named
explicit environment variable (e.g. S3_ALLOW_INSECURE_VERIFY) checked before
creating a no-verify client; additionally factor the S3 client creation into a
shared utility (replace inline get_s3_client usage with a single shared factory)
so the retry logic is not duplicated, and ensure any insecure branch logs the
env var name and error details when it is used.

Comment on lines +720 to +756
\ exist_ok=True)\n\n # Stable ordering for downstream I/O\n df = df.sort_values(by=[id_column,\
\ timestamp_column]).reset_index(drop=True)\n\n test_size = DEFAULT_TEST_SIZE\n\
\n def _early_late_split(group: pd.DataFrame, early_fraction: float)\
\ -> tuple[pd.DataFrame, pd.DataFrame]:\n \"\"\"Split one series\
\ by time: first ``early_fraction`` of rows (by time) vs remainder.\n\n\
\ Ensures at least one row in each side when len >= 2. A single-row\
\ series is kept\n entirely in the early (train/selection) part.\n\
\ \"\"\"\n g = group.sort_values(by=timestamp_column)\n \
\ n = len(g)\n if n == 0:\n return g.iloc[:0].copy(),\
\ g.iloc[:0].copy()\n if n == 1:\n return g.copy(), g.iloc[:0].copy()\n\
\ split_idx = int(n * early_fraction)\n split_idx = max(1,\
\ min(split_idx, n - 1))\n return g.iloc[:split_idx].copy(), g.iloc[split_idx:].copy()\n\
\n def _concat_sorted(parts: list, sort_by: list) -> pd.DataFrame:\n\
\ if not parts:\n return pd.DataFrame(columns=df.columns)\n\
\ out = pd.concat(parts, ignore_index=True)\n return out.sort_values(by=sort_by).reset_index(drop=True)\n\
\n train_parts: list = []\n test_parts: list = []\n for _, series_df\
\ in df.groupby(id_column, sort=False):\n tr, te = _early_late_split(series_df,\
\ 1.0 - test_size)\n train_parts.append(tr)\n test_parts.append(te)\n\
\n train_df = _concat_sorted(train_parts, [id_column, timestamp_column])\n\
\ test_df = _concat_sorted(test_parts, [id_column, timestamp_column])\n\
\n selection_parts: list = []\n extra_parts: list = []\n for _,\
\ series_train in train_df.groupby(id_column, sort=False):\n sel,\
\ ext = _early_late_split(series_train, selection_train_size)\n selection_parts.append(sel)\n\
\ extra_parts.append(ext)\n\n selection_train_df = _concat_sorted(selection_parts,\
\ [id_column, timestamp_column])\n extra_train_df = _concat_sorted(extra_parts,\
\ [id_column, timestamp_column])\n\n # Validate split outputs:\n if\
\ len(train_df) == 0:\n raise ValueError(\n \"Primary\
\ temporal split produced no train rows. The dataset may be too small for\
\ \"\n \"the configured splits. Add more rows per time series,\
\ or reduce test_size \"\n f\"(default is {DEFAULT_TEST_SIZE}).\"\
\n )\n if len(selection_train_df) == 0:\n raise ValueError(\n\
\ \"Secondary split produced an empty selection-train dataset;\
\ \"\n \"models_selection_train_dataset.csv would be empty and\
\ downstream training would fail. \"\n \"Increase rows per time\
\ series and/or selection_train_size, or reduce test_size so \"\n \
\ \"each series has enough train rows for the selection segment.\"\n\
\ )\n\n # Save test dataset to artifact\n test_df.to_csv(sampled_test_dataset.path,\
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-series temporal split doesn't validate test_df (and extra_train_df) are non-empty.

_early_late_split keeps single-row series entirely in the train side (lines 728-729). If every series has only 1 row, test_df is empty — the data loader still succeeds and writes an empty CSV; downstream predictor.evaluate(test_ts_df, ...) then dies with a cryptic shape/index error far from the root cause.

Same hazard for extra_train_df: when each series has exactly 2 rows, primary split gives 1 train row, secondary split puts it entirely in selection, extra_train_df is empty, and TimeSeriesDataFrame.from_path(extra_train_data_path) in the refit step will fail on a header-only CSV.

Add explicit non-empty checks alongside the existing train_df / selection_train_df validation, with actionable error messages.

🛠 Proposed fix
     if len(selection_train_df) == 0:
         raise ValueError(
             "Secondary split produced an empty selection-train dataset; "
             ...
         )
+    if len(test_df) == 0:
+        raise ValueError(
+            "Primary temporal split produced no test rows. Every series has <2 rows "
+            "so all rows were assigned to train. Provide at least 2 rows per series "
+            f"or reduce test_size (default is {DEFAULT_TEST_SIZE})."
+        )
+    if len(extra_train_df) == 0:
+        raise ValueError(
+            "Secondary split produced an empty extra-train dataset. "
+            "Increase rows per series or selection_train_size so each series has "
+            "rows left after the selection segment."
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml`
around lines 720 - 756, The per-series split can produce empty test_df or
extra_train_df (because _early_late_split keeps single-row series in train and
can move rows to selection), so add explicit non-empty validations after
computing test_df and extra_train_df (alongside the existing checks for train_df
and selection_train_df): if len(test_df) == 0 raise a ValueError explaining that
the temporal split produced no test rows (suggest increasing rows per series or
increasing test_size/adjusting DEFAULT_TEST_SIZE), and if len(extra_train_df) ==
0 raise a ValueError explaining that the secondary split produced no extra-train
rows (suggest increasing rows per series, reducing selection_train_size, or
reducing test_size), referencing the variables/functions _early_late_split,
train_df, test_df, extra_train_df, selection_train_df, DEFAULT_TEST_SIZE, and
selection_train_size so the checks are placed just before test_df.to_csv(...)
and before refit/data-loading steps.

Comment on lines +893 to +895
\ so that higher value = better model.\n leaderboard_df = pd.DataFrame(results).sort_values(by=eval_metric,\
\ ascending=False)\n n = len(leaderboard_df)\n leaderboard_df.index\
\ = pd.RangeIndex(start=1, stop=n + 1, name=\"rank\")\n\n html_table\
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Leaderboard sort fails hard if eval_metric is missing from any aggregated metrics.json — same as tabular.

pd.DataFrame(results).sort_values(by=eval_metric, ascending=False) will KeyError if even one refit artifact wrote a metrics.json without eval_metric. Validate presence before sorting; skip-with-warning or hard-fail, but don't crash the leaderboard with a stack trace.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.yaml`
around lines 893 - 895, The leaderboard sort currently fails when some
aggregated metrics lack the eval_metric key: before calling
pd.DataFrame(results).sort_values(by=eval_metric, ascending=False) validate that
every entry in results (or every column in the created DataFrame) contains
eval_metric; filter out entries missing eval_metric with a warning (or raise a
clear error) and only then build leaderboard_df and set its index. Update the
code path that prepares results/metrics.json aggregation and the leaderboard_df
creation to perform this presence check and logging so sort_values never
receives a missing key.

Comment thread requirements.txt
Comment on lines +1 to +4
# DO NOT EDIT MANUALLY
# This file gets autogenerated by uv via the following command:
# uv export --frozen --no-dev --no-editable --no-annotate
# Then the "." (self-reference) line was removed so Cachi2/Hermeto can prefetch only remote deps.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Regenerate lockfile to match CI’s expected output before merge.

CI is currently failing because this file is out of sync with the generator. Regenerate and commit the exact output from the project workflow (make requirements) to unblock the merge and avoid drift.

As per coding guidelines, "REVIEW PRIORITIES: ... 3. Bug-prone patterns and error handling gaps".

🧰 Tools
🪛 GitHub Actions: Check requirements.txt / 0_check.txt

[error] 1-1: requirements.txt is out of sync with the generated output. Step 'git diff --exit-code requirements.txt || { ... exit 1; }' failed; run 'make requirements' and commit the result.

🪛 GitHub Actions: Check requirements.txt / check

[error] 1-1: CI check failed: 'git diff --exit-code requirements.txt' indicates requirements.txt is out of sync. Run 'make requirements' and commit the result.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 1 - 4, Regenerate the autogenerated lockfile
so it matches CI output by running the project workflow that produces
requirements.txt (use the documented command: run the generator via the
project's make target, e.g., "make requirements" which internally runs "uv
export --frozen --no-dev --no-editable --no-annotate"), then update
requirements.txt with the exact produced output and ensure you remove the "."
self-reference line as the project expects; commit the regenerated file so CI
and the checked-in requirements.txt are in sync.

Comment thread requirements.txt
Comment on lines 364 to +366
urllib3==2.6.3 \
--hash=sha256:e0260bcb8dc9a00766b1263632d4cea3d51ce498787c24062ef187f6ce351fdc
--hash=sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed \
--hash=sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Current pinned urllib3 line:"
rg -n '^urllib3==' requirements.txt

echo "Query OSV for urllib3 advisories (includes affected/fixed ranges):"
curl -s https://api.osv.dev/v1/query -H 'Content-Type: application/json' \
  -d '{"package":{"name":"urllib3","ecosystem":"PyPI"},"version":"2.6.3"}' | jq .

Repository: opendatahub-io/pipelines-components

Length of output: 9561


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Current pinned urllib3 line:"
rg -n '^urllib3==' requirements.txt
echo "Query OSV for urllib3 advisories:"
curl -s https://api.osv.dev/v1/query -H 'Content-Type: application/json' \
  -d '{"package":{"name":"urllib3","ecosystem":"PyPI"},"version":"2.7.0"}' | jq .

Repository: opendatahub-io/pipelines-components

Length of output: 166


Upgrade urllib3==2.6.3 to 2.7.0 (two HIGH security advisories block merge).

urllib3 is pinned to a vulnerable version flagged for:

  • GHSA-mf9v-mfxr-j63j (CVE-2026-44432): Decompression-bomb safeguard bypass in streaming API → CWE-409: Uncontrolled Resource Consumption (excessive CPU/memory on chunked decompression of highly compressed untrusted responses, particularly with Brotli).
  • GHSA-qccp-gfcp-xxvc (CVE-2026-44431): Sensitive headers (Authorization, Cookie, Proxy-Authorization) forwarded across origins via low-level redirect flow → CWE-200: Exposure of Sensitive Information (credential leakage in proxied cross-origin redirects).

Fix: Bump urllib3 in the source dependency manifest to ≥2.7.0, then regenerate this lockfile (make requirements).

🧰 Tools
🪛 OSV Scanner (2.3.8)

[HIGH] 364-364: urllib3 2.6.3: urllib3: Decompression-bomb safeguards bypassed in parts of the streaming API

(GHSA-mf9v-mfxr-j63j)


[HIGH] 364-364: urllib3 2.6.3: urllib3: Sensitive headers forwarded across origins in proxied low-level redirects

(GHSA-qccp-gfcp-xxvc)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 364 - 366, requirements.txt pins urllib3 to
2.6.3 which is vulnerable; update the urllib3 entry (symbol: urllib3) to at
least 2.7.0 and then regenerate the lockfile by running the project's dependency
build command (e.g., run "make requirements" or the repo's equivalent) so the
hashes and lock entries are updated to reflect urllib3>=2.7.0 and resolve the
two security advisories.

@hbelmiro hbelmiro marked this pull request as draft May 15, 2026 15:20
@hbelmiro hbelmiro closed this May 15, 2026
@hbelmiro hbelmiro deleted the merge-rhds-main-to-odh-main branch May 15, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants