Are there any important considerations when evaluating the Llama-3-8B-instruct model? My evaluation result is only 0.44.
I use the command:
python eval_math.py --model /xxx/pretrain/Meta-Llama-3-8B-Instruct --data_file /xxx/eval_math/GSM8K_test_data.jsonl --save_path /xxx/Meta-Llama-3-8B-Instruct.json --tensor_parallel_size 8 --seed 42

Are there any important considerations when evaluating the Llama-3-8B-instruct model? My evaluation result is only 0.44.
I use the command: