Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,14 @@ python eval.py \
| **client.base_url** | Base URL of the model server for API requests with vllm. | `http://localhost:8080/v1` |
| **client.is_chat_model** | Indicates if the model follows a chat-based interface. | `True` |
| **client.generate_kwargs.temperature** | Temperature for model response randomness. | `0.0` |
| **envs.names** | Dash-separated list of environments to evaluate, e.g., `nle-minihack`. | `babyai-babaisai-textworld-crafter-nle-minihack`|
| **client.alternate_roles** | If True the instruction prompt will be fused with first observation. Required by some LLMs. | `False` |
| **envs.names** | Dash-separated list of environments to evaluate, e.g., `nle-minihack`. | `babyai-babaisai-textworld-crafter-nle-minihack`|



## FAQ:

- Mac fork error:
Mac systems might complain about fork when evaluating in multiprocessing mode (`eval.num_workers > 1`). To fix this export the following before running eval: `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`
- Alternate roles:
Some LLMs/VLMs require alternating roles. You can fuse the instruction prompt with the first observation to comply with this with the following: `client.alternate_roles=True`
Loading