Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Nov 25, 2025

image image image

Evals Dashboard UX Improvements

Summary

This PR enhances the web-evals dashboard with better visibility into tool usage metrics, improved exercise selection, and various quality-of-life improvements.

New Features

Run Details Page

  • Added aggregate statistics panel showing pass/fail counts, pass rate, total tokens, cost, and duration at a glance
  • Tool usage summary with color-coded success rates (green for 100%, yellow for ≥80%, red for <80%)
  • Hover tooltips to reveal full tool names from abbreviations

New Run Configuration

  • API config selector when importing settings files with multiple configurations
  • Language toggle buttons (e.g., "go", "python", "javascript") to quickly select/deselect all exercises for a language
  • Concurrency and timeout settings now persist to localStorage across sessions

Runs List

  • Clickable table rows for faster navigation to run details
  • Dynamic tool usage columns - columns are now generated based on what tools were actually used across runs, sorted by total usage (most-used tools appear first)
  • Each tool column displays attempt count and success rate
  • Abbreviation tooltips explain column headers (e.g., "RF" → "read_file")
  • "View Settings" option in the dropdown menu to inspect run configuration

Improvements

  • MultiSelect component now supports controlled mode via value prop
  • Deprecated models are now filtered from the Roo Code Cloud model list
  • Removed hardcoded tool columns in favor of dynamic generation from run data

Technical Notes

  • No breaking changes to existing functionality
  • All changes are contained within the apps/web-evals package
  • Tool columns use proper ToolName typing from @roo-code/types

Important

Enhances web-evals dashboard with dynamic tool columns, improved exercise selection, and UX improvements.

  • Behavior:
    • Adds dynamic tool usage columns in run.tsx and runs.tsx, sorted by total usage.
    • Enhances run.tsx with aggregate statistics panel and tool usage summary.
    • Adds language toggle buttons in new-run.tsx for exercise selection.
    • Concurrency and timeout settings in new-run.tsx now persist to localStorage.
  • Components:
    • Updates MultiSelect in multi-select.tsx to support controlled mode via value prop.
    • Adds Dialog for viewing run settings in run.tsx.
  • Misc:
    • Filters deprecated models in use-roo-code-cloud-models.ts.
    • Adds formatDateTime to formatters.ts for date formatting.
    • Minor UI and UX improvements across components.

This description was created by Ellipsis for f0492aa. You can customize this summary. It will automatically update as commits are pushed.

…mprovements

- Add aggregate statistics panel on run details page
- Add dynamic tool usage columns sorted by total usage
- Add API config selector for multi-config imports
- Add language toggle buttons for exercise selection
- Persist concurrency/timeout settings to localStorage
- Make table rows clickable for faster navigation
- Add View Settings option in dropdown menu
- Support controlled mode for MultiSelect component
- Filter deprecated models from Roo Code Cloud list
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. UI/UX UI/UX related or focused labels Nov 25, 2025
@roomote
Copy link
Contributor

roomote bot commented Nov 25, 2025

Oroocle Clock   See task on Roo Cloud

Re-review complete for the latest dashboard changes (runs list sorting plus provider and created columns). No new issues were identified and all previously flagged items remain resolved.

  • MultiSelect Backspace handler still calls onValueChange directly while other paths go through setSelectedValues, which can cause duplicate updates in controlled mode.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 25, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Nov 25, 2025
- Update running task filter to check tokenUsage like TaskStatus component
- Add tokenUsage and usageUpdatedAt to stats useMemo dependencies
@hannesrudolph hannesrudolph moved this from Triage to PR [Needs Review] in Roo Code Roadmap Nov 26, 2025
@hannesrudolph hannesrudolph added PR - Needs Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Nov 26, 2025
@roomote
Copy link
Contributor

roomote bot commented Nov 26, 2025

Fixaroo Clock   Follow along on Roo Cloud\n\nWorking on the MultiSelect controlled-mode Backspace handler issue: ensuring Backspace updates selection state exclusively via setSelectedValues so onValueChange is not fired twice in controlled mode.

@mrubens mrubens merged commit 4442397 into main Nov 26, 2025
16 of 17 checks passed
@mrubens mrubens deleted the feat/evals-dashboard-ux-improvements branch November 26, 2025 05:15
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Nov 26, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files. UI/UX UI/UX related or focused

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants