@@ -10,7 +10,7 @@ vllm serve meta-llama/Llama-3.2-1B-Instruct --port 8080
1010python eval.py \
1111 agent.type=naive \
1212 agent.max_image_history=0 \
13- agent.max_history =16 \
13+ agent.max_text_history =16 \
1414 eval.num_workers=16 \
1515 client.client_name=vllm \
1616 client.model_id=meta-llama/Llama-3.2-1B-Instruct \
@@ -39,7 +39,7 @@ You can then run the evaluation with:
3939python eval.py \
4040 agent.type=naive \
4141 agent.max_image_history=0 \
42- agent.max_history =16 \
42+ agent.max_text_history =16 \
4343 eval.num_workers=16 \
4444 client.client_name=openai \
4545 client.model_id=gpt-4o-mini-2024-07-18
@@ -52,7 +52,7 @@ You can activate the VLM mode by increasing the `max_image_history` argument, fo
5252```
5353python eval.py \
5454 agent.type=naive \
55- agent.max_history =16 \
55+ agent.max_text_history =16 \
5656 agent.max_image_history=1 \
5757 eval.num_workers=16 \
5858 client.client_name=openai \
@@ -66,7 +66,7 @@ To resume an incomplete evaluation, use eval.resume_from. For example, if an eva
6666python eval.py \
6767 agent.type=naive \
6868 agent.max_image_history=0 \
69- agent.max_history =16 \
69+ agent.max_text_history =16 \
7070 eval.num_workers=16 \
7171 client.client_name=openai \
7272 client.model_id=gpt-4o-mini-2024-07-18 \
@@ -81,7 +81,7 @@ python eval.py \
8181| ---------------------------| ---------------------------------------------------------------------------------------------------| -------------------------------------------|
8282| ** agent.type** | Type of agent used | ` naive ` |
8383| ** agent.remember_cot** | Whether the agent should remember chain-of-thought (CoT) during episodes. | ` True ` |
84- | ** agent.max_history ** | Maximum number of dialogue history entries to retain. | ` 16 ` |
84+ | ** agent.max_text_history ** | Maximum number of dialogue history entries to retain. | ` 16 ` |
8585| ** agent.max_image_history** | Maximum number of images included in the history. Use >= 1 if you want to use VLM mode | ` 0 ` |
8686| ** eval.num_workers** | Number of parallel environment workers for parallel evaluation. | ` 1 ` |
8787| ** eval.num_episodes** | Number of episodes per environment for evaluation. | ` {nle: 5, minihack: 5, babyai: 25, ...} ` |
0 commit comments