Skip to content

cuga-appworld-output-v1

Latest

Choose a tag to compare

@lilacheden lilacheden released this 24 May 13:43
· 6 commits to main since this release

CUGA AppWorld Results (v1)

Pre-computed CLEAR evaluation results for the CUGA agent on the AppWorld benchmark, judged by gpt-oss-120b.

Viewing in the Dashboard

1. Download the results

gh release download cuga-appworld-output-v1 --pattern "*.zip"

2. Launch the dashboard

run-clear-agentic-dashboard

3. Upload the downloaded ZIP file in the UI

Details

  • Agent: CUGA
  • Benchmark: AppWorld
  • Judge model: gpt-oss-120b
  • Analyses: Step-by-step + Full trajectory