Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 968 Bytes

File metadata and controls

33 lines (23 loc) · 968 Bytes

Leaderboard

The WebArena-Verified leaderboard is the public results table for benchmark submissions. Each entry is produced from deterministic offline evaluation and linked to a submission record.

What You Can Inspect

Each row includes:

  • Rank
  • Name
  • Overall Score
  • Per-site scores: Shopping, Shopping Admin, Reddit, GitLab, Wikipedia, Map
  • Submission ID
  • Evaluator Version

UI Features

  • Search and filter by submission name or submission ID
  • Toggle between Full and Hard boards
  • Export leaderboard rows as CSV

!!! info Evaluation is deterministic and offline. WebArena-Verified does not use LLM-as-judge scoring for leaderboard entries.

Next Step

To publish results, follow the submission walkthrough: Submitting Results.