Skip to content

Comments

Bump LightEval to enable DP>1#629

Merged
lewtun merged 4 commits intomainfrom
bump-light
Apr 30, 2025
Merged

Bump LightEval to enable DP>1#629
lewtun merged 4 commits intomainfrom
bump-light

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented Apr 28, 2025

This PR bumps lighteval to enable DP>1 again since it is compatible with vllm v0.8.4. See: huggingface/lighteval#670 (comment)

Waiting for evals to run, but code can be reviewed as is.

Update: evals finished and the diff is up to a few percentage points, but most on AIME24 which is rather noisy due to small sample size:

output-46

Code snippet to run all evals:

#!/bin/bash

MODELS=(
"DeepSeek-R1-Distill-Qwen-1.5B"
"DeepSeek-R1-Distill-Qwen-7B"
"DeepSeek-R1-Distill-Qwen-14B"
"DeepSeek-R1-Distill-Qwen-32B"
"DeepSeek-R1-Distill-Llama-8B"
"DeepSeek-R1-Distill-Llama-70B"
)

for M in "${MODELS[@]}"; do
  echo "Running benchmark for model: $M"
  python scripts/run_benchmarks.py --model-id deepseek-ai/$M --benchmarks aime24 math_500 gpqa lcb
done

TODO

  • Re-run evals to check score variance

@lewtun lewtun requested a review from edbeeching April 28, 2025 13:51
fi \
),))
$(if $(filter tensor,$(PARALLEL)),export VLLM_WORKER_MULTIPROC_METHOD=spawn &&,) \
MODEL_ARGS="pretrained=$(MODEL),dtype=bfloat16,$(PARALLEL_ARGS),max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}" && \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_num_batched_tokens is no longer needed to be included, so I've removed it for simplicity

@lewtun lewtun merged commit 75c3999 into main Apr 30, 2025
1 check passed
@lewtun lewtun deleted the bump-light branch April 30, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants