Skip to content

Commit 358a94c

Browse files
DavidePaglieriarushi-agrawal
authored andcommitted
Update README.md (balrog-ai#36)
* Update README.md * update readme
1 parent fca863b commit 358a94c

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

README.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,20 @@
77
---
88

99
# BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
10+
1011
BALROG is a novel benchmark evaluating agentic LLM and VLM capabilities on long-horizon interactive tasks using reinforcement learning environments. Check out how current models fare on our [leaderboard](https://balrogai.com). You can read more about BALROG in our [paper](https://arxiv.org/abs/2411.13543).
1112

1213
## Features
14+
1315
- Comprehensive evaluation of agentic abilities
1416
- Support for both language and vision-language models
1517
- Integration with popular AI APIs and local deployment
1618
- Easy integration for custom agents, new environments and new models
1719

1820
## Installation
21+
1922
We advise using conda for the installation
23+
2024
```bash
2125
conda create -n balrog python=3.10 -y
2226
conda activate balrog
@@ -26,12 +30,15 @@ cd BALROG
2630
pip install -e .
2731
balrog-post-install
2832
```
33+
2934
On Mac make sure you have `wget` installed for the `balrog-post-install`
3035

3136
## Docker
37+
3238
We have provided some docker images. Please see the [relevant README](docker/README.md).
3339

3440
## ⚡️ Evaluate using vLLM locally
41+
3542
We support running LLMs/VLMs locally using [vLLM](https://github.com/vllm-project/vllm). You can spin up a vLLM client and evaluate your agent on BALROG in the following way:
3643

3744
```bash
@@ -49,43 +56,53 @@ python eval.py \
4956
```
5057

5158
On Mac you might have to first export the following to suppress some fork() errors:
59+
5260
```
5361
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
5462
```
5563

5664
Check out [vLLM](https://github.com/vllm-project/vllm) for more options on how to serve your models fast and efficiently.
5765

58-
## 🛜 Evaluate using popular APIs
59-
We support out of the box clients for OpenAI, Anthropic and Google Gemini APIs. First set up your API key:
66+
## 🛜 Evaluate using API
67+
68+
We support how of the box clients for OpenAI, Anthropic and Google Gemini APIs. If you want to evaluate an agent using one of these APIs, you first have to set up your API key in one of two ways:
69+
70+
You can either directly export it:
6071

6172
```bash
6273
export OPENAI_API_KEY=<KEY>
6374
export ANTHROPIC_API_KEY=<KEY>
6475
export GEMINI_API_KEY=<KEY>
6576
```
6677

67-
Then run the evaluation with:
78+
Or you can modify the `SECRETS` file, adding your api keys.
79+
80+
You can then run the evaluation with:
6881

6982
```bash
7083
python eval.py \
7184
agent.type=naive \
7285
agent.max_image_history=0 \
73-
eval.num_workers=64 \
86+
agent.max_history=16 \
87+
eval.num_workers=16 \
7488
client.client_name=openai \
7589
client.model_id=gpt-4o-mini-2024-07-18
7690
```
7791

7892
## Documentation
93+
7994
- [Evaluation Guide](https://github.com/balrog-ai/BALROG/blob/main/docs/evaluation.md) - Detailed instructions for various evaluation scenarios
8095
- [Agent Development](https://github.com/balrog-ai/BALROG/blob/main/docs/agents.md) - Tutorial on creating custom agents
8196
- [Few Shot Learning](https://github.com/balrog-ai/BALROG/blob/main/docs/few_shot_learning.md) - Instructions on how to run Few Shot Learning
8297

8398
We welcome contributions! Please see our [Contributing Guidelines](https://github.com/balrog-ai/BALROG/blob/main/docs/contribution.md) for details.
8499

85100
## License
101+
86102
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
87103

88104
## Citation
105+
89106
If you use BALROG in any of your work, please cite:
90107

91108
```

0 commit comments

Comments
 (0)