Skip to content

fix(automl): fix leaderboard ranking for negated error metrics#7258

Merged
openshift-merge-bot[bot] merged 3 commits intoopendatahub-io:mainfrom
nickmazzi:nmazzite-RHOAIENG-58390
Apr 15, 2026
Merged

fix(automl): fix leaderboard ranking for negated error metrics#7258
openshift-merge-bot[bot] merged 3 commits intoopendatahub-io:mainfrom
nickmazzi:nmazzite-RHOAIENG-58390

Conversation

@nickmazzi
Copy link
Copy Markdown
Contributor

@nickmazzi nickmazzi commented Apr 15, 2026

https://issues.redhat.com/browse/RHOAIENG-58390

Description

Fix AutoML leaderboard ranking for metrics reported by AutoGluon.

AutoGluon negates all error/loss metrics (MASE, MAPE, MAE, MSE, RMSE, etc.) so they uniformly follow a "higher is better" convention. The previous implementation had a hardcoded errorMetrics array that attempted to sort these metrics ascending and used Math.abs() to convert negated values back to positive for display. This caused incorrect ranking — the worst values (most negative) were ranked first instead of the best values (closest to zero).

Changes:

  • AutomlLeaderboard.tsx: Removed the hardcoded errorMetrics array and conditional sort logic. Ranking now always sorts descending (higher is better), matching AutoGluon's negation convention.
  • utils.ts: Removed isErrorMetric helper, ERROR_METRICS set, and the Math.abs/conditional logic in computeRankMap. computeRankMap now sorts uniformly descending.
  • AutomlModelDetailsModalHeader.tsx: Removed isErrorMetric import and Math.abs conversion — metric values display as-is.
  • ModelEvaluationTab.tsx: Removed isErrorMetric import and the wrapping formatMetricValue function — raw metric values pass through toNumericMetric and formatMetricValue directly.
image image

How Has This Been Tested?

  • Ran the full AutoML unit test suite — all tests pass (utils, leaderboard, modal header, evaluation tab).
  • Verified timeseries ranking: LSTM (MASE = -0.09, closest to 0) correctly ranks first.
  • Verified metric display: negated values render as-is (e.g., -0.082 for MASE).

Test Impact

Four test files updated:

  • AutomlLeaderboard.spec.tsx: Updated mockTimeseriesModels to use negated metric values matching real AutoGluon output. Updated timeseries ranking test description and comments.
  • utils.spec.ts: Updated computeRankMap timeseries tests to use negated values. Removed the isErrorMetric test block entirely.
  • AutomlModelDetailsModalHeader.spec.tsx: Updated test to expect raw negated MASE value (-0.082) instead of absolute value.
  • ModelEvaluationTab.spec.tsx: Updated test to expect raw negated MSE value (-12.450) instead of absolute value.

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)
  • The code follows our Best Practices (React coding standards, PatternFly usage, performance considerations)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

AutoGluon negates error/loss metrics so all metrics are uniformly
"higher is better". Remove the hardcoded errorMetrics array and
sort all metrics descending so the best values (closest to zero
for negated error metrics) are ranked first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 15, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

The leaderboard component's ranking initialization was simplified by removing conditional logic that treated error metrics (mase, mse, mae, rmse, mape) differently from performance metrics. The change replaces this branching with a uniform descending sort for all numeric values, relying on the assumption that AutoGluon negates error and loss metrics internally, making higher values uniformly desirable across all metric types.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Actionable issues

Unvalidated external dependency assumption: The change assumes AutoGluon negates error/loss metrics without runtime validation. If this behavior is not guaranteed across versions or configurations, incorrect metric ordering could occur silently. Consider adding assertions or validation checks to confirm metric negation at runtime, or document the AutoGluon version/configuration this depends on.

Loss of explicit error metric handling: Removing the hardcoded error metric key list (mase, mse, mae, rmse, mape) means new error metrics added to the system will automatically be treated as performance metrics (assuming they're negated). If any error metric isn't negated by AutoGluon, this logic will produce inverted rankings for that metric. Verify the complete set of error metrics AutoGluon handles and confirm they're all negated.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing leaderboard ranking logic for negated error metrics in AutoML.
Description check ✅ Passed PR description is mostly complete with clear problem statement, detailed changes, test evidence, and screenshots. However, the self-checklist items are not marked (all boxes remain unchecked).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nickmazzi
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx (1)

359-380: ⚠️ Potential issue | 🟠 Major

Ranking diverges from shared utility; silent failure if backend sign assumptions break

Lines 359–380 assume error metrics are always negated by the backend and always sort descending. This diverges from computeRankMap() (utils.ts:164), which detects error metrics with isErrorMetric() and applies conditional Math.abs() before sorting. If the backend fails to negate an error metric, sortedByMetric will invert rankings silently (worst models rank first), while other UI surfaces using computeRankMap remain correct.

Apply isErrorMetric() to the comparator:

Fix
 import {
   formatMetricName,
   formatMetricValue,
   getOptimizedMetricForTask,
+  isErrorMetric,
 } from '~/app/utilities/utils';
@@
-    // Initial ranking by optimized metric value (higher is better).
-    // AutoGluon negates error/loss metrics so all metrics are uniformly "higher is better".
+    // Initial ranking by optimized metric value.
+    // Normalize lower-is-better metrics so comparator remains correct
+    // even if sign conventions vary across providers/run versions.
     const sortedByMetric = entries.toSorted((a, b) => {
@@
-      // Both are numbers — descending (higher is better)
       const aNum = typeof aVal === 'number' ? aVal : 0;
       const bNum = typeof bVal === 'number' ? bVal : 0;
-      return bNum - aNum;
+      const aScore = isErrorMetric(optimizedMetric) ? -Math.abs(aNum) : aNum;
+      const bScore = isErrorMetric(optimizedMetric) ? -Math.abs(bNum) : bNum;
+      return bScore - aScore;
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx`
around lines 359 - 380, sortedByMetric currently assumes optimizedMetricValue is
already signed for "higher is better" and sorts descending, which can invert
rankings if the backend didn't negate error metrics; update the comparator in
the entries.toSorted call (sortedByMetric) to mirror computeRankMap's behavior
by calling isErrorMetric() and, when aVal/bVal are numbers and the metric is an
error metric, use Math.abs(aVal) and Math.abs(bVal) (or otherwise the numeric
value) for comparison while still treating 'N/A' as last; ensure you reference
optimizedMetricValue and isErrorMetric() so the comparator normalizes values
consistently with computeRankMap().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx`:
- Around line 359-380: sortedByMetric currently assumes optimizedMetricValue is
already signed for "higher is better" and sorts descending, which can invert
rankings if the backend didn't negate error metrics; update the comparator in
the entries.toSorted call (sortedByMetric) to mirror computeRankMap's behavior
by calling isErrorMetric() and, when aVal/bVal are numbers and the metric is an
error metric, use Math.abs(aVal) and Math.abs(bVal) (or otherwise the numeric
value) for comparison while still treating 'N/A' as last; ensure you reference
optimizedMetricValue and isErrorMetric() so the comparator normalizes values
consistently with computeRankMap().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: a8e0de9b-a377-4a5a-9ad1-d29d3b8a6361

📥 Commits

Reviewing files that changed from the base of the PR and between 3ef417a and 7112147.

📒 Files selected for processing (1)
  • packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx

nickmazzi and others added 2 commits April 15, 2026 11:59
… data

Mock timeseries data was using positive values for error metrics, but
the ranking logic (updated in 7112147) now sorts descending uniformly
since AutoGluon negates error/loss metrics. Align test data with this
convention so LSTM (-0.09 MASE, closest to 0) correctly ranks first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the isErrorMetric helper and ERROR_METRICS set since AutoGluon
already negates error/loss metrics. Display raw metric values without
Math.abs conversion. Simplify computeRankMap to sort uniformly
descending. Update all affected tests to expect negated values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nickmazzi nickmazzi marked this pull request as ready for review April 15, 2026 16:15
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress This PR is in WIP state label Apr 15, 2026
@openshift-ci openshift-ci Bot requested review from GAUNSD and daniduong April 15, 2026 16:15
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.82%. Comparing base (3ef417a) to head (6f606c3).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7258      +/-   ##
==========================================
+ Coverage   64.80%   64.82%   +0.01%     
==========================================
  Files        2441     2441              
  Lines       75996    75996              
  Branches    19158    19158              
==========================================
+ Hits        49253    49265      +12     
+ Misses      26743    26731      -12     

see 10 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ef417a...6f606c3. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@GAUNSD GAUNSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GAUNSD

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit ea50e83 into opendatahub-io:main Apr 15, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants