You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/evaluation.md
+11-1Lines changed: 11 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -92,4 +92,14 @@ python eval.py \
92
92
|**client.base_url**| Base URL of the model server for API requests with vllm. |`http://localhost:8080/v1`|
93
93
|**client.is_chat_model**| Indicates if the model follows a chat-based interface. |`True`|
94
94
|**client.generate_kwargs.temperature**| Temperature for model response randomness. |`0.0`|
95
-
|**envs.names**| Dash-separated list of environments to evaluate, e.g., `nle-minihack`. |`babyai-babaisai-textworld-crafter-nle-minihack`|
95
+
|**client.alternate_roles**| If True the instruction prompt will be fused with first observation. Required by some LLMs. |`False`|
96
+
|**envs.names**| Dash-separated list of environments to evaluate, e.g., `nle-minihack`. |`babyai-babaisai-textworld-crafter-nle-minihack`|
97
+
98
+
99
+
100
+
## FAQ:
101
+
102
+
- Mac fork error:
103
+
Mac systems might complain about fork when evaluating in multiprocessing mode (`eval.num_workers > 1`). To fix this export the following before running eval: `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`
104
+
- Alternate roles:
105
+
Some LLMs/VLMs require alternating roles. You can fuse the instruction prompt with the first observation to comply with this with the following: `client.alternate_roles=True`
0 commit comments