Skip to content

feat(automl): time-series backtesting curve support#120

Merged
openshift-merge-bot[bot] merged 11 commits into
opendatahub-io:mainfrom
LukaszCmielowski:fix_timeseries_notebook_bug
Jun 3, 2026
Merged

feat(automl): time-series backtesting curve support#120
openshift-merge-bot[bot] merged 11 commits into
opendatahub-io:mainfrom
LukaszCmielowski:fix_timeseries_notebook_bug

Conversation

@LukaszCmielowski

@LukaszCmielowski LukaszCmielowski commented Jun 2, 2026

Copy link
Copy Markdown

Description of your changes:

  1. AutoML time-series back testing data generation.
  2. The fix for time-series notebook issue.
  3. Update pipelines descriptions.

Checklist:

Pre-Submission Checklist

Additional Checklist Items for New or Updated Components/Pipelines

  • metadata.yaml includes fresh lastVerified timestamp
  • All required files
    are present and complete
  • OWNERS file lists appropriate maintainers
  • README provides clear documentation with usage examples
  • Component follows snake_case naming convention
  • No security vulnerabilities in dependencies
  • Containerfile included if using a custom base image

Summary by CodeRabbit

  • New Features

    • Time-series training now writes back-testing artifacts with per-window metrics and best/worst series summaries; notebooks load and display these when present.
    • Back-testing generation failures are non-fatal (logged), so training still produces primary metrics.
  • Documentation

    • README and notebook guidance updated to document back-testing outputs and improved covariate forecasting guidance.
    • Pipeline descriptions updated to mention back-testing output.
  • Tests

    • Added comprehensive tests covering back-testing generation and failure handling.

https://redhat.atlassian.net/browse/RHOAIENG-65711

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@LukaszCmielowski, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 58 minutes and 21 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cab8bf1b-2f22-4edb-90a7-80eb3b1d401c

📥 Commits

Reviewing files that changed from the base of the PR and between b954c4c and 23ce764.

📒 Files selected for processing (8)
  • components/training/automl/autogluon_timeseries_models_training/README.md
  • components/training/automl/autogluon_timeseries_models_training/component.py
  • components/training/automl/autogluon_timeseries_models_training/tests/test_component_unit.py
  • components/training/automl/shared/back_testing.py
  • components/training/automl/shared/notebook_templates/timeseries_notebook.ipynb
  • components/training/automl/shared/tests/test_back_testing.py
  • components/training/automl/shared/tests/test_timeseries_notebook_utils.py
  • components/training/automl/shared/timeseries_notebook_utils.py
📝 Walkthrough

Walkthrough

Adds a back_testing builder that produces a schema-aligned back_testing.json from predictor backtest APIs, integrates it into the training component (writes metrics/back_testing.json and conditionally records its path), expands the timeseries notebook with a Back-testing summary and improved known-covariates handling, adds tests for the builder and integration, and updates a CI import exception and several pipeline description strings.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes


Security & Code Quality Issues

  • back_testing.py: unbounded aggregation risk — large holdout/forecast point counts may cause memory exhaustion when building per-series forecast rows and aggregating metrics (CWE-400: Uncontrolled Resource Consumption). Recommend enforce MAX_FORECAST_POINTS_PER_WINDOW early and validate num_val_windows and input sizes before aggregating.

  • back_testing.py: silent swallow of evaluation errors — per-window evaluate() exceptions are logged and produce empty metrics; if all windows fail this returns an empty/partial payload without an explicit error (CWE-703: Improper Check or Handling of Exceptional Conditions). Recommend return an error flag or raise when all evaluate calls fail, or document empty-per_window_metrics as a valid terminal state.

  • back_testing.py: inconsistent NaN/inf handling — to_finite_float returns None but downstream averaging may propagate NaN or divide-by-zero in MAPE/RMSE/MAE computations (CWE-369: Division by Zero). Recommend strict masking of non-finite values at aggregation boundaries and explicit checks before averaging.

  • Notebook template: unsafe file loading pattern — loading model artifact JSON paths constructed in the notebook may allow unintended file access if artifact paths are untrusted (CWE-426: Untrusted Search Path). Recommend canonicalize/validate model_artifact_path and avoid loading arbitrary external paths.

  • Component integration: missing schema validation prior to persisting — build_back_testing_json output is written directly to disk without schema verification, risking malformed payloads reaching consumers. Recommend validate against an explicit JSON schema (jsonschema/Pydantic) before persisting.

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive PR description is vague and lacks specifics. High-level bullet points omit details on what backtesting generates, which notebook issue is fixed, and why pipeline descriptions changed. Expand description with concrete details: (1) what back_testing.json contains and why, (2) specific notebook issue fixed, (3) which pipelines and why descriptions changed.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(automl): time-series backtesting curve support' clearly maps to the main change: adding backtesting artifact generation for AutoML time-series models.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/training/automl/autogluon_timeseries_models_training/component.py (1)

392-400: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

back_testing path written unconditionally even when generation failed.

The backtest block (lines 351-371) swallows failures and skips writing back_testing.json, but model.json always advertises metrics/back_testing.json. On the warning path the referenced file does not exist, so any consumer trusting model.json metadata will hit a missing file. Gate the metadata entry on actual write success.

🛠️ Track write success and conditionally add the path
                 try:
                     back_testing_payload = build_back_testing_json(
                         ...
                     )
                     with (metrics_path / "back_testing.json").open("w", encoding="utf-8") as f:
                         json.dump(back_testing_payload, f, indent=2)
+                    back_testing_written = True
                 except Exception as backtest_exc:
                     logger.warning(
                         "Could not generate back_testing.json for model %r: %s. Skipping backtest artifact.",
                         model_name_full,
                         backtest_exc,
                     )
+                    back_testing_written = False
                     "metrics": str(Path(model_name_full) / "metrics"),
-                    "back_testing": str(Path(model_name_full) / "metrics" / "back_testing.json"),
                 },

Then add back_testing to location only when back_testing_written is true.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/autogluon_timeseries_models_training/component.py`
around lines 392 - 400, The model metadata always includes a back_testing path
even when the backtest write failed; update the code that builds model_metadata
so it only inserts the "back_testing": str(Path(model_name_full) / "metrics" /
"back_testing.json") entry into the "location" dict when the backtest write
actually succeeded (use the existing back_testing_written flag from the backtest
block); i.e., build the base location dict first, then if back_testing_written
is True add the back_testing key before assigning model_metadata.
🧹 Nitpick comments (6)
.github/scripts/check_imports/import_exceptions.yaml (1)

14-17: ⚡ Quick win

Narrow the numpy import exception scope to avoid policy overreach.

Line 14 currently grants numpy to every file matching components/**/shared/*, which weakens least-privilege controls for import governance and can hide unintended dependency spread. Scope this exception to the specific shared file(s) that require numpy.

Proposed policy tightening
 "components/**/shared/*":
   - pandas
-  - numpy
   - component
+"components/training/automl/shared/back_testing.py":
+  - numpy

As per coding guidelines: “REVIEW PRIORITIES: 1. Security vulnerabilities ... 2. Architectural issues and anti-patterns”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/check_imports/import_exceptions.yaml around lines 14 - 17,
The current import_exceptions.yaml grants "numpy" to the broad pattern
"components/**/shared/*", which is too permissive; restrict the exception by
removing "numpy" from that wildcard entry and add explicit exception entries for
only the specific shared module file(s) that actually import numpy (e.g.,
replace the wildcard pattern with exact filenames under the shared directory
that need numpy or add new keys for
"components/.../shared/specific_shared_file.py"), updating the
"components/**/shared/*" block to keep only allowed libs and adding targeted
"numpy" entries for the exact shared file paths so the rule maintains
least-privilege.
components/training/automl/shared/tests/test_back_testing.py (2)

53-60: ⚡ Quick win

Cover the actual per-item branch here.

This fixture only has one series, so it never proves _holdout_frame() keeps the last prediction_length rows per item. A regression to tsdf.tail(prediction_length) would still pass this test. Add at least a two-item panel and assert each item retains its own final horizon.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/shared/tests/test_back_testing.py` around lines 53
- 60, Update the test_holdout_frame_takes_last_prediction_length_per_item to
include multiple series so it verifies per-item behavior: create a panel with at
least two distinct items using _make_panel (e.g., item IDs "A" and "B") with
different time-series values, call _holdout_frame(panel, prediction_length=2),
then assert that the resulting holdout contains exactly the last
prediction_length rows for each item (check both length and that
holdout["target"] contains the expected tail values for each item individually,
and optionally verify holdout["item_id"] or equivalent grouping to ensure rows
belong to the correct series).

129-158: ⚡ Quick win

Make the ranking fixture discriminate between metrics.

This does not actually verify eval_metric dispatch because the same series wins under every common lower-is-better point metric with the current numbers. Use scale-skewed targets/predictions so MAPE and RMSE rank different items; otherwise an implementation that ignores eval_metric still passes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/shared/tests/test_back_testing.py` around lines
129 - 158, The test currently uses symmetric errors so every point metric ranks
the same series; change the synthetic targets/predictions in
test_ranks_by_point_metric_matching_eval_metric so RMSE and MAPE would choose
different winners (e.g., make one series have small absolute errors but large
relative errors by using very small target values for one series and larger
targets for the other, or scale predictions so one series has low RMSE but high
MAPE), then keep eval_metric="RMSE" and assert the expected best_performer
(item_id "good") still wins; this forces build_back_testing_json (and its use of
predictor.evaluate/predictor.backtest_predictions) to actually dispatch on the
eval_metric rather than relying on identical ranking under all metrics. Ensure
the modified predictions/targets are applied to window_targets, good_preds,
bad_preds (the DataFrames used to build predictions and window_targets) so the
test discriminates between metrics.
components/training/automl/shared/back_testing.py (2)

291-324: 💤 Low value

Redundant recomputation of per-window metrics and forecast rows.

_item_window_metrics (which itself calls _forecast_data_for_item) runs once per item during ranking, then _series_payload recomputes both again for the best/worst series. For the selected performers, forecast extraction is effectively done three times. Caching the per (item_id, window_id) result would remove the duplication. Low priority given the 500-point cap per window.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/shared/back_testing.py` around lines 291 - 324,
The code recomputes per-window metrics and forecast rows multiple times: first
when building per_item_window_metrics using _item_window_metrics (which calls
_forecast_data_for_item), and again inside _series_payload for best/worst
series; this is wasteful. Fix by computing and caching results keyed by
(item_id, window_id) when you first iterate over zip(predictions_windows,
targets_windows) (the same loop that fills per_item_window_metrics), storing
both the metrics and forecast_data; then have _series_payload read from that
cache instead of calling _item_window_metrics or _forecast_data_for_item again.
Ensure the cache is populated before calling
_series_ranking_metric/_select_best_worst and used by _series_payload to build
windows_payload.

98-104: 💤 Low value

Numeric-column detection diverges from _quantile_bounds.

_mean_prediction_column uses to_finite_float(c), which rejects string column labels (AutoGluon emits quantile columns as strings like "0.1"), so numeric_cols stays empty and it falls back to columns[0]. _quantile_bounds instead uses float(col), which parses those strings. The two helpers will disagree about which columns are numeric. Harmless while "mean" is present, but brittle if it isn't.

♻️ Align parsing with _quantile_bounds
-    numeric_cols = [c for c in predictions.columns if to_finite_float(c) is not None]
-    if numeric_cols:
-        return str(numeric_cols[0])
+    def _as_float(col: Any) -> float | None:
+        try:
+            return float(col)
+        except (TypeError, ValueError):
+            return None
+    numeric_cols = [c for c in predictions.columns if _as_float(c) is not None]
+    if numeric_cols:
+        # Prefer the median (0.5) if present, else the first numeric column.
+        median = next((c for c in numeric_cols if _as_float(c) == 0.5), numeric_cols[0])
+        return str(median)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/shared/back_testing.py` around lines 98 - 104, The
_mean_prediction_column helper currently uses to_finite_float(c) which rejects
numeric-looking string labels and therefore can disagree with _quantile_bounds;
change the numeric-column detection in _mean_prediction_column to mirror
_quantile_bounds by attempting to parse each column label with float(col)
(catching ValueError/TypeError) to consider it numeric, so string labels like
"0.1" are recognized; preserve the existing fallbacks (prefer "mean", then first
numeric-parsed column, then columns[0]) and keep references to the function name
_mean_prediction_column and the differing helper _quantile_bounds when making
this change.
components/training/automl/autogluon_timeseries_models_training/component.py (1)

349-349: 💤 Low value

Move the import out of the per-model loop.

from ... import build_back_testing_json re-runs the import machinery on every top_models iteration. Hoist it next to the other deferred imports (e.g., near line 255) so it executes once.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/autogluon_timeseries_models_training/component.py`
at line 349, The import of build_back_testing_json is inside the per-model loop
(iterating over top_models); hoist it so it runs once by moving "from
kfp_components.components.training.automl.shared.back_testing import
build_back_testing_json" out of the loop and placing it with the other deferred
imports (near where other late imports are declared) and then use the imported
build_back_testing_json within the loop; update any local references to ensure
they still point to the same symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/training/automl/shared/back_testing.py`:
- Around line 386-400: per_window_metrics currently stores window_scores
returned by predictor.evaluate() which uses AutoGluon’s "higher-is-better"
convention (so error metrics like MAPE/RMSE/MAE are negated) causing sign
mismatch with series_analysis/_point_errors() which emits positive error values;
fix by normalizing those metrics before serialization: after window_scores =
filter_finite_metrics(predictor.evaluate(...)) (and before appending to
per_window_metrics) detect the known error metric keys (e.g.,
"MAPE","RMSE","MAE") and multiply their values by -1 (or otherwise flip their
sign) so stored per_window_metrics["metrics"] contains positive error
magnitudes, or alternatively add a boolean flag (e.g., "higher_is_better":
true/false) alongside metrics to explicitly document the convention. Ensure
changes reference window_scores, predictor.evaluate, filter_finite_metrics,
per_window_metrics and are consistent with series_analysis/_point_errors().

In
`@components/training/automl/shared/notebook_templates/timeseries_notebook.ipynb`:
- Around line 159-163: The _download_model function uses model_name directly to
form base_dir and later targets, allowing path traversal; validate and sanitize
model_name before using it as a local extraction root by rejecting or
normalizing any input containing path separators or parent-traversal (e.g.,
'..') and only allowing a safe basename (use os.path.basename or a strict
whitelist/regex). Then construct base_dir from a fixed workspace directory
joined with that sanitized basename (not the raw model_name), and perform
canonicalization (os.path.abspath/os.path.realpath) on both base_dir and any
computed abs_target and enforce containment (abs_target.startswith(base_dir +
os.sep)) to ensure downloads cannot escape the intended workspace. Ensure checks
reference the existing symbols model_name, base_dir, target, relative, and
abs_target in _download_model.
- Around line 58-60: The global suppression using
warnings.filterwarnings("ignore") mutes important security and TLS warnings;
replace it with scoped or targeted suppression instead: remove the global call
and either wrap only the specific noisy code blocks in a
warnings.catch_warnings() context or call warnings.filterwarnings("ignore",
category=<SpecificWarningClass>) for the exact non-security category you want
silenced; search for the warnings.filterwarnings("ignore") call and update it so
that warnings.catch_warnings() is used around benign code sections or the filter
is restricted to a precise warning category/message rather than globally
silencing all warnings.
- Around line 184-188: The current exception handler for SSLError around
_download_model/_get_s3_bucket silently falls back to an insecure verify=False
retry; remove that insecure retry and instead fail closed: when catching
SSLError (from _download_model or _get_s3_bucket), log the error with details
and re-raise or surface the exception to the caller so the download fails; if an
insecure bypass is absolutely required, require an explicit, local-only opt-in
flag (e.g., a clearly named parameter) and document it, but do not perform an
automatic verify=False retry.

---

Outside diff comments:
In
`@components/training/automl/autogluon_timeseries_models_training/component.py`:
- Around line 392-400: The model metadata always includes a back_testing path
even when the backtest write failed; update the code that builds model_metadata
so it only inserts the "back_testing": str(Path(model_name_full) / "metrics" /
"back_testing.json") entry into the "location" dict when the backtest write
actually succeeded (use the existing back_testing_written flag from the backtest
block); i.e., build the base location dict first, then if back_testing_written
is True add the back_testing key before assigning model_metadata.

---

Nitpick comments:
In @.github/scripts/check_imports/import_exceptions.yaml:
- Around line 14-17: The current import_exceptions.yaml grants "numpy" to the
broad pattern "components/**/shared/*", which is too permissive; restrict the
exception by removing "numpy" from that wildcard entry and add explicit
exception entries for only the specific shared module file(s) that actually
import numpy (e.g., replace the wildcard pattern with exact filenames under the
shared directory that need numpy or add new keys for
"components/.../shared/specific_shared_file.py"), updating the
"components/**/shared/*" block to keep only allowed libs and adding targeted
"numpy" entries for the exact shared file paths so the rule maintains
least-privilege.

In
`@components/training/automl/autogluon_timeseries_models_training/component.py`:
- Line 349: The import of build_back_testing_json is inside the per-model loop
(iterating over top_models); hoist it so it runs once by moving "from
kfp_components.components.training.automl.shared.back_testing import
build_back_testing_json" out of the loop and placing it with the other deferred
imports (near where other late imports are declared) and then use the imported
build_back_testing_json within the loop; update any local references to ensure
they still point to the same symbol.

In `@components/training/automl/shared/back_testing.py`:
- Around line 291-324: The code recomputes per-window metrics and forecast rows
multiple times: first when building per_item_window_metrics using
_item_window_metrics (which calls _forecast_data_for_item), and again inside
_series_payload for best/worst series; this is wasteful. Fix by computing and
caching results keyed by (item_id, window_id) when you first iterate over
zip(predictions_windows, targets_windows) (the same loop that fills
per_item_window_metrics), storing both the metrics and forecast_data; then have
_series_payload read from that cache instead of calling _item_window_metrics or
_forecast_data_for_item again. Ensure the cache is populated before calling
_series_ranking_metric/_select_best_worst and used by _series_payload to build
windows_payload.
- Around line 98-104: The _mean_prediction_column helper currently uses
to_finite_float(c) which rejects numeric-looking string labels and therefore can
disagree with _quantile_bounds; change the numeric-column detection in
_mean_prediction_column to mirror _quantile_bounds by attempting to parse each
column label with float(col) (catching ValueError/TypeError) to consider it
numeric, so string labels like "0.1" are recognized; preserve the existing
fallbacks (prefer "mean", then first numeric-parsed column, then columns[0]) and
keep references to the function name _mean_prediction_column and the differing
helper _quantile_bounds when making this change.

In `@components/training/automl/shared/tests/test_back_testing.py`:
- Around line 53-60: Update the
test_holdout_frame_takes_last_prediction_length_per_item to include multiple
series so it verifies per-item behavior: create a panel with at least two
distinct items using _make_panel (e.g., item IDs "A" and "B") with different
time-series values, call _holdout_frame(panel, prediction_length=2), then assert
that the resulting holdout contains exactly the last prediction_length rows for
each item (check both length and that holdout["target"] contains the expected
tail values for each item individually, and optionally verify holdout["item_id"]
or equivalent grouping to ensure rows belong to the correct series).
- Around line 129-158: The test currently uses symmetric errors so every point
metric ranks the same series; change the synthetic targets/predictions in
test_ranks_by_point_metric_matching_eval_metric so RMSE and MAPE would choose
different winners (e.g., make one series have small absolute errors but large
relative errors by using very small target values for one series and larger
targets for the other, or scale predictions so one series has low RMSE but high
MAPE), then keep eval_metric="RMSE" and assert the expected best_performer
(item_id "good") still wins; this forces build_back_testing_json (and its use of
predictor.evaluate/predictor.backtest_predictions) to actually dispatch on the
eval_metric rather than relying on identical ranking under all metrics. Ensure
the modified predictions/targets are applied to window_targets, good_preds,
bad_preds (the DataFrames used to build predictions and window_targets) so the
test discriminates between metrics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f2b39a45-fdab-42d5-9006-f5f8c64ae2e1

📥 Commits

Reviewing files that changed from the base of the PR and between c651b59 and 3d0d30d.

📒 Files selected for processing (7)
  • .github/scripts/check_imports/import_exceptions.yaml
  • components/training/automl/autogluon_timeseries_models_training/README.md
  • components/training/automl/autogluon_timeseries_models_training/component.py
  • components/training/automl/shared/back_testing.py
  • components/training/automl/shared/notebook_templates/timeseries_notebook.ipynb
  • components/training/automl/shared/run_status_templates/pipelines/autogluon-timeseries-training-pipeline.json
  • components/training/automl/shared/tests/test_back_testing.py

Comment thread components/training/automl/shared/back_testing.py
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@LukaszCmielowski

Copy link
Copy Markdown
Author

/ok-to-test

@LukaszCmielowski

Copy link
Copy Markdown
Author

@dryszka please take a look.
The integration tests passing:

image

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/training/automl/shared/tests/test_back_testing.py`:
- Around line 357-359: Update the stale comment above the assertions in the test
to reflect that for a single series, best_performer and worst_performer are both
set (non-None) — i.e., the single series is considered both best and worst per
_build_series_analysis — so change the comment to state that best and worst
should be non-None (the same series) to match the assertions
payload["series_analysis"]["best_performer"] and
payload["series_analysis"]["worst_performer"].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 83e4819e-ad5f-447a-9a17-e1866ec19440

📥 Commits

Reviewing files that changed from the base of the PR and between 3d0d30d and b954c4c.

📒 Files selected for processing (8)
  • components/training/automl/autogluon_timeseries_models_training/component.py
  • components/training/automl/autogluon_timeseries_models_training/tests/test_component_unit.py
  • components/training/automl/shared/back_testing.py
  • components/training/automl/shared/tests/test_back_testing.py
  • pipelines/data_processing/autorag/documents_indexing_pipeline/pipeline.py
  • pipelines/training/automl/autogluon_tabular_training_pipeline/pipeline.py
  • pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.py
  • pipelines/training/autorag/documents_rag_optimization_pipeline/pipeline.py
✅ Files skipped from review due to trivial changes (2)
  • pipelines/training/autorag/documents_rag_optimization_pipeline/pipeline.py
  • pipelines/training/automl/autogluon_timeseries_training_pipeline/pipeline.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/training/automl/autogluon_timeseries_models_training/component.py
  • components/training/automl/shared/back_testing.py

Comment on lines +357 to +359
# For single series, best and worst should be None (only one series)
assert payload["series_analysis"]["best_performer"] is not None
assert payload["series_analysis"]["worst_performer"] is not None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale comment contradicts the assertion.

The comment claims best/worst "should be None (only one series)", but the asserts require both to be non-None — and per _build_series_analysis, a single series is both best and worst performer, so the assertions are right and the comment is wrong. Fix the comment to avoid misleading future maintainers into "correcting" the test.

📝 Proposed fix
-        # For single series, best and worst should be None (only one series)
+        # Single series is both best and worst performer, so both are populated.
         assert payload["series_analysis"]["best_performer"] is not None
         assert payload["series_analysis"]["worst_performer"] is not None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# For single series, best and worst should be None (only one series)
assert payload["series_analysis"]["best_performer"] is not None
assert payload["series_analysis"]["worst_performer"] is not None
# Single series is both best and worst performer, so both are populated.
assert payload["series_analysis"]["best_performer"] is not None
assert payload["series_analysis"]["worst_performer"] is not None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/training/automl/shared/tests/test_back_testing.py` around lines
357 - 359, Update the stale comment above the assertions in the test to reflect
that for a single series, best_performer and worst_performer are both set
(non-None) — i.e., the single series is considered both best and worst per
_build_series_analysis — so change the comment to state that best and worst
should be non-None (the same series) to match the assertions
payload["series_analysis"]["best_performer"] and
payload["series_analysis"]["worst_performer"].

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

@dryszka dryszka left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread components/training/automl/shared/notebook_templates/timeseries_notebook.ipynb Outdated
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@openshift-ci openshift-ci Bot removed the lgtm label Jun 3, 2026
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@LukaszCmielowski

Copy link
Copy Markdown
Author

/retest

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@LukaszCmielowski

Copy link
Copy Markdown
Author

/retest

@dryszka

dryszka commented Jun 3, 2026

Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label Jun 3, 2026

@Mateusz-Switala Mateusz-Switala left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dryszka, Mateusz-Switala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@LukaszCmielowski

Copy link
Copy Markdown
Author

/retest

@openshift-merge-bot openshift-merge-bot Bot merged commit 85fee01 into opendatahub-io:main Jun 3, 2026
25 of 26 checks passed
@LukaszCmielowski LukaszCmielowski deleted the fix_timeseries_notebook_bug branch June 12, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants