eval_plus

Failed to load latest commit information.

Cannot retrieve latest commit at this time.

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
convert_data.py		convert_data.py
exclude_patterns.txt		exclude_patterns.txt
generate.py		generate.py
model.py		model.py
readme.md		readme.md
requirements.txt		requirements.txt
test.sh		test.sh

readme.md

Sourced from the Qwen2.5-Coder repository with updated dependencies for better reproducability.

Evaluation for HumanEval(+) and MBPP(+)

This folder contains the code and scripts to evaluate the performance of the QwenCoder-2.5 series models on EvalPlus benchmark, which includes HumanEval(+) and MBPP(+) datasets. These datasets are designed to test code generation capabilities under varied conditions.

1. Setup

Please refer to EvalPlus for detailed setup instructions. Install the required packages using:

pip install evalplus --upgrade
pip install -r requirements.txt

2. Inference and Evaluation

We utilize 8xA100 GPUs for this benchmark. The following scripts are used to run the inference and evaluations:

bash test.sh {path_to_your_local_model_checkpoint} {tensor_parallel_size} {output_dir}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

eval_plus

eval_plus

readme.md

Evaluation for HumanEval(+) and MBPP(+)

1. Setup

2. Inference and Evaluation

Collapse file tree

Files

eval_plus

Directory actions

More options

Directory actions

More options

Latest commit

History

eval_plus

Folders and files

parent directory

readme.md

Evaluation for HumanEval(+) and MBPP(+)

1. Setup

2. Inference and Evaluation