·
6 commits
to main
since this release
CUGA AppWorld Results (v1)
Pre-computed CLEAR evaluation results for the CUGA agent on the AppWorld benchmark, judged by gpt-oss-120b.
Viewing in the Dashboard
1. Download the results
gh release download cuga-appworld-output-v1 --pattern "*.zip"
2. Launch the dashboard
run-clear-agentic-dashboard
3. Upload the downloaded ZIP file in the UI
Details
- Agent: CUGA
- Benchmark: AppWorld
- Judge model: gpt-oss-120b
- Analyses: Step-by-step + Full trajectory