You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/evaluation.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ python eval.py \
93
93
|**client.is_chat_model**| Indicates if the model follows a chat-based interface. |`True`|
94
94
|**client.generate_kwargs.temperature**| Temperature for model response randomness. |`0.0`|
95
95
|**client.alternate_roles**| If True the instruction prompt will be fused with first observation. Required by some LLMs. |`False`|
96
-
|**client.temperature**| If set to null will default to the API default temperature. Use a float from 0.0 to 1.0. otherwise. |`null`|
96
+
|**client.temperature**| If set to null will default to the API default temperature. Use a float from 0.0 to 2.0. otherwise. |`1.0`|
97
97
|**envs.names**| Dash-separated list of environments to evaluate, e.g., `nle-minihack`. |`babyai-babaisai-textworld-crafter-nle-minihack`|
98
98
99
99
@@ -105,4 +105,4 @@ python eval.py \
105
105
- Alternate roles:
106
106
Some LLMs/VLMs require alternating roles. You can fuse the instruction prompt with the first observation to comply with this with the following: `client.alternate_roles=True`
107
107
- Temperature:
108
-
We recommend running models with temperature ranges around 0.5-0.7, or to use the default temperature of the model APIs. Too low temperatures can cause some of the more brittle models to endlessly repeat actions or create incoherent outputs.
108
+
We recommend running models with temperature ranges around 0.7-1.0, or to use the default temperature of the model APIs. Too low temperatures can cause some of the more brittle models to endlessly repeat actions or create incoherent outputs.
0 commit comments