Evaluation script fails to work for Qwen-Math models.

I tried running the Qwen-Math models instead of the Qwen models, but find the evaluation scripts don't work for Qwen-Math

`ValueError: User-specified max_model_len (32768) is greater than the derived max_model_len (max_position_embeddings=4096 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
`
```
MODEL=Qwen/Qwen2.5-Math-1.5B-Instruct
MODEL=Qwen/Qwen2.5-Math-1.5B

NUM_GPUS=1
MODEL_ARGS="pretrained=$MODEL,dtype=float16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=math_500
OUTPUT_DIR=data/evals/$MODEL
CUDA_VISIBLE_DEVICES=1 lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
    --custom-tasks src/open_r1/evaluate.py \
    --use-chat-template \
    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
    --output-dir $OUTPUT_DIR 
```

Hope you could help me out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation script fails to work for Qwen-Math models. #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation script fails to work for Qwen-Math models. #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions