-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Dear Authors,
First, thank you for sharing this excellent paper and the accompanying code. I am currently working on reproducing the experiments and have some detailed questions regarding the results in Table 2 (comparing Win Rates across different Map Scenarios).
To ensure I am correctly replicating the experimental setup, I would be grateful if you could clarify the following details about the evaluation protocol:
-
Evaluation Scripts & Repetition: Could you specify which exact evaluation scripts were used to generate the results for Table 2? Furthermore, what was the number of runs for each script/scenario? (For example, was each map scenario evaluated by running
multiprocess_run_env.py10 times and then aggregating the results?) -
Win Rate Calculation: What is the precise formula used to calculate the Win Rate? For instance, is it:
Win Rate = (Number of Wins / Total Number of Episodes) * 100%- Or does it involve a different approach, such as calculating the rate against multiple specific opponents or under certain conditions?
I am using the Qwen model in my reproduction efforts and want to make sure my methodology aligns with the paper's. These specifics would greatly help me in matching the experimental conditions and understanding the results.
Thank you for your time and consideration.
Best regards,
Zhicheng LI