You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/evaluation.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ python eval.py \
93
93
|**client.is_chat_model**| Indicates if the model follows a chat-based interface. |`True`|
94
94
|**client.generate_kwargs.temperature**| Temperature for model response randomness. |`0.0`|
95
95
|**client.alternate_roles**| If True the instruction prompt will be fused with first observation. Required by some LLMs. |`False`|
96
-
|**client.temperature**| If set to null will default to the API default temperature. Use a float from 0.0 to 1.0. otherwise.|`null`|
96
+
|**client.temperature**| If set to null will default to the API default temperature. Use a float from 0.0 to 1.0. otherwise. |`null`|
97
97
|**envs.names**| Dash-separated list of environments to evaluate, e.g., `nle-minihack`. |`babyai-babaisai-textworld-crafter-nle-minihack`|
98
98
99
99
@@ -104,3 +104,5 @@ python eval.py \
104
104
Mac systems might complain about fork when evaluating in multiprocessing mode (`eval.num_workers > 1`). To fix this export the following before running eval: `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`
105
105
- Alternate roles:
106
106
Some LLMs/VLMs require alternating roles. You can fuse the instruction prompt with the first observation to comply with this with the following: `client.alternate_roles=True`
107
+
- Temperature:
108
+
We recommend running models with temperature ranges around 0.5-0.7, or to use the default temperature of the model APIs. Too low temperatures can cause some of the more brittle models to endlessly repeat actions or create incoherent outputs.
0 commit comments