fix(automl): fix leaderboard ranking for negated error metrics#7258
Conversation
AutoGluon negates error/loss metrics so all metrics are uniformly "higher is better". Remove the hardcoded errorMetrics array and sort all metrics descending so the best values (closest to zero for negated error metrics) are ranked first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughThe leaderboard component's ranking initialization was simplified by removing conditional logic that treated error metrics ( Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Actionable issuesUnvalidated external dependency assumption: The change assumes AutoGluon negates error/loss metrics without runtime validation. If this behavior is not guaranteed across versions or configurations, incorrect metric ordering could occur silently. Consider adding assertions or validation checks to confirm metric negation at runtime, or document the AutoGluon version/configuration this depends on. Loss of explicit error metric handling: Removing the hardcoded error metric key list ( 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx (1)
359-380:⚠️ Potential issue | 🟠 MajorRanking diverges from shared utility; silent failure if backend sign assumptions break
Lines 359–380 assume error metrics are always negated by the backend and always sort descending. This diverges from
computeRankMap()(utils.ts:164), which detects error metrics withisErrorMetric()and applies conditionalMath.abs()before sorting. If the backend fails to negate an error metric,sortedByMetricwill invert rankings silently (worst models rank first), while other UI surfaces usingcomputeRankMapremain correct.Apply
isErrorMetric()to the comparator:Fix
import { formatMetricName, formatMetricValue, getOptimizedMetricForTask, + isErrorMetric, } from '~/app/utilities/utils'; @@ - // Initial ranking by optimized metric value (higher is better). - // AutoGluon negates error/loss metrics so all metrics are uniformly "higher is better". + // Initial ranking by optimized metric value. + // Normalize lower-is-better metrics so comparator remains correct + // even if sign conventions vary across providers/run versions. const sortedByMetric = entries.toSorted((a, b) => { @@ - // Both are numbers — descending (higher is better) const aNum = typeof aVal === 'number' ? aVal : 0; const bNum = typeof bVal === 'number' ? bVal : 0; - return bNum - aNum; + const aScore = isErrorMetric(optimizedMetric) ? -Math.abs(aNum) : aNum; + const bScore = isErrorMetric(optimizedMetric) ? -Math.abs(bNum) : bNum; + return bScore - aScore; });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx` around lines 359 - 380, sortedByMetric currently assumes optimizedMetricValue is already signed for "higher is better" and sorts descending, which can invert rankings if the backend didn't negate error metrics; update the comparator in the entries.toSorted call (sortedByMetric) to mirror computeRankMap's behavior by calling isErrorMetric() and, when aVal/bVal are numbers and the metric is an error metric, use Math.abs(aVal) and Math.abs(bVal) (or otherwise the numeric value) for comparison while still treating 'N/A' as last; ensure you reference optimizedMetricValue and isErrorMetric() so the comparator normalizes values consistently with computeRankMap().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx`:
- Around line 359-380: sortedByMetric currently assumes optimizedMetricValue is
already signed for "higher is better" and sorts descending, which can invert
rankings if the backend didn't negate error metrics; update the comparator in
the entries.toSorted call (sortedByMetric) to mirror computeRankMap's behavior
by calling isErrorMetric() and, when aVal/bVal are numbers and the metric is an
error metric, use Math.abs(aVal) and Math.abs(bVal) (or otherwise the numeric
value) for comparison while still treating 'N/A' as last; ensure you reference
optimizedMetricValue and isErrorMetric() so the comparator normalizes values
consistently with computeRankMap().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: a8e0de9b-a377-4a5a-9ad1-d29d3b8a6361
📒 Files selected for processing (1)
packages/automl/frontend/src/app/components/run-results/AutomlLeaderboard.tsx
… data Mock timeseries data was using positive values for error metrics, but the ranking logic (updated in 7112147) now sorts descending uniformly since AutoGluon negates error/loss metrics. Align test data with this convention so LSTM (-0.09 MASE, closest to 0) correctly ranks first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the isErrorMetric helper and ERROR_METRICS set since AutoGluon already negates error/loss metrics. Display raw metric values without Math.abs conversion. Simplify computeRankMap to sort uniformly descending. Update all affected tests to expect negated values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7258 +/- ##
==========================================
+ Coverage 64.80% 64.82% +0.01%
==========================================
Files 2441 2441
Lines 75996 75996
Branches 19158 19158
==========================================
+ Hits 49253 49265 +12
+ Misses 26743 26731 -12 see 10 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: GAUNSD The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ea50e83
into
opendatahub-io:main
https://issues.redhat.com/browse/RHOAIENG-58390
Description
Fix AutoML leaderboard ranking for metrics reported by AutoGluon.
AutoGluon negates all error/loss metrics (MASE, MAPE, MAE, MSE, RMSE, etc.) so they uniformly follow a "higher is better" convention. The previous implementation had a hardcoded
errorMetricsarray that attempted to sort these metrics ascending and usedMath.abs()to convert negated values back to positive for display. This caused incorrect ranking — the worst values (most negative) were ranked first instead of the best values (closest to zero).Changes:
AutomlLeaderboard.tsx: Removed the hardcodederrorMetricsarray and conditional sort logic. Ranking now always sorts descending (higher is better), matching AutoGluon's negation convention.utils.ts: RemovedisErrorMetrichelper,ERROR_METRICSset, and theMath.abs/conditional logic incomputeRankMap.computeRankMapnow sorts uniformly descending.AutomlModelDetailsModalHeader.tsx: RemovedisErrorMetricimport andMath.absconversion — metric values display as-is.ModelEvaluationTab.tsx: RemovedisErrorMetricimport and the wrappingformatMetricValuefunction — raw metric values pass throughtoNumericMetricandformatMetricValuedirectly.How Has This Been Tested?
-0.082for MASE).Test Impact
Four test files updated:
AutomlLeaderboard.spec.tsx: UpdatedmockTimeseriesModelsto use negated metric values matching real AutoGluon output. Updated timeseries ranking test description and comments.utils.spec.ts: UpdatedcomputeRankMaptimeseries tests to use negated values. Removed theisErrorMetrictest block entirely.AutomlModelDetailsModalHeader.spec.tsx: Updated test to expect raw negated MASE value (-0.082) instead of absolute value.ModelEvaluationTab.spec.tsx: Updated test to expect raw negated MSE value (-12.450) instead of absolute value.Request review criteria:
Self checklist (all need to be checked):
If you have UI changes:
After the PR is posted & before it merges:
main