Un Ministral, des grands singes de langage

This repo demonstrates how to replicate the results of the Large Language Monkeys paper using a different model, Ministral 8B, and a different dataset, HumanEval.

It runs both the code generation model and the sandboxed code evaluation on Modal and massively in parallel -- on Modal's free tier, that's code generation across 10 H100 GPUs running at an aggregate throughput of 5 - 10k tok/s per GPU and code evaluation across over 100 Sandboxes.

For more on using Modal Sandboxes, see our product launch post.

How-To

Setup Modal

pip install modal  # that's it :)
modal setup  # if you're new to Modal

Test and deploy inference on Modal

# test
modal run le_inference.py
# deploy
modal deploy le_inference.py

Test and run the benchmark in parallel

# test
modal run le_client.py --dry-run --n 1 --subsample 1
# test and save results
modal run le_client.py --no-dry-run --n 1 --subsample 1
# run full dataset, 1000 attempts per problem
modal run le_client.py --no-dry-run --n 1000 --subsample 100 # percent

Calculate results in parallel in sandboxes

# run concurrently or afterwards
modal run les_evals.py

Analyze results

modal launch jupyter --volume mistral-humaneval --mount analysis
# run the notebook in `mount/`

Other files

The le_quant and le_quant_wrapper scripts demonstrate language model quantization with llm-compressor run on Modal.

We ran those already to generate the model used by default in the example, so you don't need to run them, but they are included for completeness.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
analysis		analysis
assets		assets
.gitignore		.gitignore
README.md		README.md
le_client.py		le_client.py
le_inference.py		le_inference.py
le_quant.py		le_quant.py
le_quant_wrapper.py		le_quant_wrapper.py
les_evals.py		les_evals.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Un Ministral, des grands singes de langage

How-To

Setup Modal

Test and deploy inference on Modal

Test and run the benchmark in parallel

Calculate results in parallel in sandboxes

Analyze results

Other files

About

Uh oh!

Releases

Packages

Uh oh!

Languages

modal-labs/mistral-human-eval

Folders and files

Latest commit

History

Repository files navigation

Un Ministral, des grands singes de langage

How-To

Setup Modal

Test and deploy inference on Modal

Test and run the benchmark in parallel

Calculate results in parallel in sandboxes

Analyze results

Other files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages