Skip to content

MikeCorv/llm_evals

Repository files navigation

llm_evals

This repository contains a collection of Colab notebooks used to evaluate In Context Learning with various Large Language Models (LLMs) on the MMLU benchmark and other various benchmarks. The output from colab is cleared for clarity.

These evaluations leverage the EleutherAI lm-evaluation-harness library.

image

Notable Achievements:

(MMLU)

Google Gemma-3-4b-it: Demonstrated the highest sensitivity to in-context learning, jumping from a 51% score (0-shot) to 65% (5-shot).

Llama-3.2-3B-Instruct: Showed rapid adaptation, improving from 59% (0-shot) to approximately 65% with just 1-shot.

About

This repository contains a collection of Colab notebooks used to evaluate various Large Language Models (LLMs) on the MMLU (Massive Multitask Language Understanding) benchmark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors