fix(automl): fix leaderboard ranking for negated error metrics by nickmazzi · Pull Request #7258 · opendatahub-io/odh-dashboard

nickmazzi · 2026-04-15T15:47:18Z

https://issues.redhat.com/browse/RHOAIENG-58390

Description

Fix AutoML leaderboard ranking for metrics reported by AutoGluon.

AutoGluon negates all error/loss metrics (MASE, MAPE, MAE, MSE, RMSE, etc.) so they uniformly follow a "higher is better" convention. The previous implementation had a hardcoded errorMetrics array that attempted to sort these metrics ascending and used Math.abs() to convert negated values back to positive for display. This caused incorrect ranking — the worst values (most negative) were ranked first instead of the best values (closest to zero).

Changes:

AutomlLeaderboard.tsx: Removed the hardcoded errorMetrics array and conditional sort logic. Ranking now always sorts descending (higher is better), matching AutoGluon's negation convention.
utils.ts: Removed isErrorMetric helper, ERROR_METRICS set, and the Math.abs/conditional logic in computeRankMap. computeRankMap now sorts uniformly descending.
AutomlModelDetailsModalHeader.tsx: Removed isErrorMetric import and Math.abs conversion — metric values display as-is.
ModelEvaluationTab.tsx: Removed isErrorMetric import and the wrapping formatMetricValue function — raw metric values pass through toNumericMetric and formatMetricValue directly.

How Has This Been Tested?

Ran the full AutoML unit test suite — all tests pass (utils, leaderboard, modal header, evaluation tab).
Verified timeseries ranking: LSTM (MASE = -0.09, closest to 0) correctly ranks first.
Verified metric display: negated values render as-is (e.g., -0.082 for MASE).

Test Impact

Four test files updated:

AutomlLeaderboard.spec.tsx: Updated mockTimeseriesModels to use negated metric values matching real AutoGluon output. Updated timeseries ranking test description and comments.
utils.spec.ts: Updated computeRankMap timeseries tests to use negated values. Removed the isErrorMetric test block entirely.
AutomlModelDetailsModalHeader.spec.tsx: Updated test to expect raw negated MASE value (-0.082) instead of absolute value.
ModelEvaluationTab.spec.tsx: Updated test to expect raw negated MSE value (-12.450) instead of absolute value.

Request review criteria:

Self checklist (all need to be checked):

The developer has manually tested the changes and verified that the changes work
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)
The code follows our Best Practices (React coding standards, PatternFly usage, performance considerations)

If you have UI changes:

Included any necessary screenshots or gifs if it was a UI change.
Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

The developer has tested their solution on a cluster by using the image produced by the PR to main

AutoGluon negates error/loss metrics so all metrics are uniformly "higher is better". Remove the hardcoded errorMetrics array and sort all metrics descending so the best values (closest to zero for negated error metrics) are ranked first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openshift-ci · 2026-04-15T15:47:22Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-04-15T15:47:28Z

📝 Walkthrough

Walkthrough

The leaderboard component's ranking initialization was simplified by removing conditional logic that treated error metrics (mase, mse, mae, rmse, mape) differently from performance metrics. The change replaces this branching with a uniform descending sort for all numeric values, relying on the assumption that AutoGluon negates error and loss metrics internally, making higher values uniformly desirable across all metric types.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Actionable issues

Unvalidated external dependency assumption: The change assumes AutoGluon negates error/loss metrics without runtime validation. If this behavior is not guaranteed across versions or configurations, incorrect metric ordering could occur silently. Consider adding assertions or validation checks to confirm metric negation at runtime, or document the AutoGluon version/configuration this depends on.

Loss of explicit error metric handling: Removing the hardcoded error metric key list (mase, mse, mae, rmse, mape) means new error metrics added to the system will automatically be treated as performance metrics (assuming they're negated). If any error metric isn't negated by AutoGluon, this logic will produce inverted rankings for that metric. Verify the complete set of error metrics AutoGluon handles and confirm they're all negated.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing leaderboard ranking logic for negated error metrics in AutoML.
Description check	✅ Passed	PR description is mostly complete with clear problem statement, detailed changes, test evidence, and screenshots. However, the self-checklist items are not marked (all boxes remain unchecked).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nickmazzi · 2026-04-15T15:48:28Z

@coderabbitai review

coderabbitai · 2026-04-15T15:48:34Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx (1)

359-380: ⚠️ Potential issue | 🟠 Major

Ranking diverges from shared utility; silent failure if backend sign assumptions break

Lines 359–380 assume error metrics are always negated by the backend and always sort descending. This diverges from computeRankMap() (utils.ts:164), which detects error metrics with isErrorMetric() and applies conditional Math.abs() before sorting. If the backend fails to negate an error metric, sortedByMetric will invert rankings silently (worst models rank first), while other UI surfaces using computeRankMap remain correct.

Apply isErrorMetric() to the comparator:

Fix

 import {
   formatMetricName,
   formatMetricValue,
   getOptimizedMetricForTask,
+  isErrorMetric,
 } from '~/app/utilities/utils';
@@
-    // Initial ranking by optimized metric value (higher is better).
-    // AutoGluon negates error/loss metrics so all metrics are uniformly "higher is better".
+    // Initial ranking by optimized metric value.
+    // Normalize lower-is-better metrics so comparator remains correct
+    // even if sign conventions vary across providers/run versions.
     const sortedByMetric = entries.toSorted((a, b) => {
@@
-      // Both are numbers — descending (higher is better)
       const aNum = typeof aVal === 'number' ? aVal : 0;
       const bNum = typeof bVal === 'number' ? bVal : 0;
-      return bNum - aNum;
+      const aScore = isErrorMetric(optimizedMetric) ? -Math.abs(aNum) : aNum;
+      const bScore = isErrorMetric(optimizedMetric) ? -Math.abs(bNum) : bNum;
+      return bScore - aScore;
     });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx`
around lines 359 - 380, sortedByMetric currently assumes optimizedMetricValue is
already signed for "higher is better" and sorts descending, which can invert
rankings if the backend didn't negate error metrics; update the comparator in
the entries.toSorted call (sortedByMetric) to mirror computeRankMap's behavior
by calling isErrorMetric() and, when aVal/bVal are numbers and the metric is an
error metric, use Math.abs(aVal) and Math.abs(bVal) (or otherwise the numeric
value) for comparison while still treating 'N/A' as last; ensure you reference
optimizedMetricValue and isErrorMetric() so the comparator normalizes values
consistently with computeRankMap().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx`:
- Around line 359-380: sortedByMetric currently assumes optimizedMetricValue is
already signed for "higher is better" and sorts descending, which can invert
rankings if the backend didn't negate error metrics; update the comparator in
the entries.toSorted call (sortedByMetric) to mirror computeRankMap's behavior
by calling isErrorMetric() and, when aVal/bVal are numbers and the metric is an
error metric, use Math.abs(aVal) and Math.abs(bVal) (or otherwise the numeric
value) for comparison while still treating 'N/A' as last; ensure you reference
optimizedMetricValue and isErrorMetric() so the comparator normalizes values
consistently with computeRankMap().

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: a8e0de9b-a377-4a5a-9ad1-d29d3b8a6361

📥 Commits

Reviewing files that changed from the base of the PR and between 3ef417a and 7112147.

📒 Files selected for processing (1)

packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx

… data Mock timeseries data was using positive values for error metrics, but the ranking logic (updated in 7112147) now sorts descending uniformly since AutoGluon negates error/loss metrics. Align test data with this convention so LSTM (-0.09 MASE, closest to 0) correctly ranks first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the isErrorMetric helper and ERROR_METRICS set since AutoGluon already negates error/loss metrics. Display raw metric values without Math.abs conversion. Simplify computeRankMap to sort uniformly descending. Update all affected tests to expect negated values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-15T16:49:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.82%. Comparing base (3ef417a) to head (6f606c3).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7258      +/-   ##
==========================================
+ Coverage   64.80%   64.82%   +0.01%     
==========================================
  Files        2441     2441              
  Lines       75996    75996              
  Branches    19158    19158              
==========================================
+ Hits        49253    49265      +12     
+ Misses      26743    26731      -12

see 10 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ef417a...6f606c3. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

GAUNSD

/approve
/lgtm

openshift-ci · 2026-04-15T17:22:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GAUNSD

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [GAUNSD]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci Bot added do-not-merge/work-in-progress This PR is in WIP state area/automl area/components labels Apr 15, 2026

openshift-ci Bot added the area/frontend label Apr 15, 2026

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

nickmazzi and others added 2 commits April 15, 2026 11:59

nickmazzi marked this pull request as ready for review April 15, 2026 16:15

openshift-ci Bot removed the do-not-merge/work-in-progress This PR is in WIP state label Apr 15, 2026

openshift-ci Bot requested review from GAUNSD and daniduong April 15, 2026 16:15

GAUNSD reviewed Apr 15, 2026

View reviewed changes

openshift-ci Bot assigned GAUNSD Apr 15, 2026

openshift-ci Bot added lgtm approved labels Apr 15, 2026

GAUNSD approved these changes Apr 15, 2026

View reviewed changes

openshift-merge-bot Bot merged commit ea50e83 into opendatahub-io:main Apr 15, 2026
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(automl): fix leaderboard ranking for negated error metrics#7258

fix(automl): fix leaderboard ranking for negated error metrics#7258
openshift-merge-bot[bot] merged 3 commits intoopendatahub-io:mainfrom
nickmazzi:nmazzite-RHOAIENG-58390

nickmazzi commented Apr 15, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Actionable issues

Uh oh!

nickmazzi commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

GAUNSD left a comment

Uh oh!

openshift-ci Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nickmazzi commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Test Impact

Request review criteria:

Uh oh!

openshift-ci Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Actionable issues

Uh oh!

nickmazzi commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

GAUNSD left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nickmazzi commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

codecov Bot commented Apr 15, 2026 •

edited

Loading