Bump LightEval to enable DP>1 by lewtun · Pull Request #629 · huggingface/open-r1

lewtun · 2025-04-28T12:52:56Z

This PR bumps lighteval to enable DP>1 again since it is compatible with vllm v0.8.4. See: huggingface/lighteval#670 (comment)

Waiting for evals to run, but code can be reviewed as is.

Update: evals finished and the diff is up to a few percentage points, but most on AIME24 which is rather noisy due to small sample size:

Code snippet to run all evals:

#!/bin/bash

MODELS=(
"DeepSeek-R1-Distill-Qwen-1.5B"
"DeepSeek-R1-Distill-Qwen-7B"
"DeepSeek-R1-Distill-Qwen-14B"
"DeepSeek-R1-Distill-Qwen-32B"
"DeepSeek-R1-Distill-Llama-8B"
"DeepSeek-R1-Distill-Llama-70B"
)

for M in "${MODELS[@]}"; do
  echo "Running benchmark for model: $M"
  python scripts/run_benchmarks.py --model-id deepseek-ai/$M --benchmarks aime24 math_500 gpqa lcb
done

TODO

Re-run evals to check score variance

lewtun · 2025-04-28T14:05:02Z

Makefile

 		fi \
 	),))
 	$(if $(filter tensor,$(PARALLEL)),export VLLM_WORKER_MULTIPROC_METHOD=spawn &&,) \
-	MODEL_ARGS="pretrained=$(MODEL),dtype=bfloat16,$(PARALLEL_ARGS),max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}" && \


max_num_batched_tokens is no longer needed to be included, so I've removed it for simplicity

lewtun added 2 commits April 28, 2025 12:52

Bump LightEval to enable DP>1

d1a7f87

Remove redundant arg

c0f8f7b

lewtun requested a review from edbeeching April 28, 2025 13:51

lewtun commented Apr 28, 2025

View reviewed changes

edbeeching approved these changes Apr 28, 2025

View reviewed changes

lewtun added 2 commits April 30, 2025 09:46

Update eval scores

d28d23b

Fix slurm

b6b8d06

lewtun merged commit 75c3999 into main Apr 30, 2025
1 check passed

lewtun deleted the bump-light branch April 30, 2025 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Bump LightEval to enable DP>1#629

Bump LightEval to enable DP>1#629
lewtun merged 4 commits intomainfrom
bump-light

lewtun commented Apr 28, 2025 •

edited

Loading

Uh oh!

lewtun Apr 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

lewtun commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

lewtun Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lewtun commented Apr 28, 2025 •

edited

Loading