Skip to content

Commit 599c8ba

Browse files
committed
Update README.md
1 parent 2fae29d commit 599c8ba

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -603,19 +603,19 @@ The web interface is built using Gradio and runs locally on your machine. No dat
603603

604604
# 🧪 Experiments
605605

606-
To reproduce OWL's GAIA benchmark score of 58.18:
607-
Furthermore, to ensure optimal performance on the GAIA benchmark, please note that our `gaia58.18` branch includes a customized version of the CAMEL framework in the `owl/camel` directory. This version contains enhanced toolkits with improved stability for gaia benchmark compared to the standard CAMEL installation.
606+
To reproduce OWL's GAIA benchmark score:
607+
Furthermore, to ensure optimal performance on the GAIA benchmark, please note that our `gaia69` branch includes a customized version of the CAMEL framework in the `owl/camel` directory. This version contains enhanced toolkits with improved stability for gaia benchmark compared to the standard CAMEL installation.
608608

609609
When running the benchmark evaluation:
610610

611-
1. Switch to the `gaia58.18` branch:
611+
1. Switch to the `gaia69` branch:
612612
```bash
613-
git checkout gaia58.18
613+
git checkout gaia69
614614
```
615615

616616
2. Run the evaluation script:
617617
```bash
618-
python run_gaia_roleplaying.py
618+
python run_gaia_workforce_claude.py
619619
```
620620

621621
This will execute the same configuration that achieved our top-ranking performance on the GAIA benchmark.

0 commit comments

Comments
 (0)