llm_evals

This repository contains a collection of Colab notebooks used to evaluate In Context Learning with various Large Language Models (LLMs) on the MMLU benchmark and other various benchmarks. The output from colab is cleared for clarity.

These evaluations leverage the EleutherAI lm-evaluation-harness library.

Notable Achievements:

(MMLU)

Google Gemma-3-4b-it: Demonstrated the highest sensitivity to in-context learning, jumping from a 51% score (0-shot) to 65% (5-shot).

Llama-3.2-3B-Instruct: Showed rapid adaptation, improving from 59% (0-shot) to approximately 65% with just 1-shot.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
FinalRuns.ipynb		FinalRuns.ipynb
HuggingFaceTB_SmolLM2_1_7B_Instruct_GSM8K.ipynb		HuggingFaceTB_SmolLM2_1_7B_Instruct_GSM8K.ipynb
MMLU_TEST_QWEN2_5B_LLAMA3B_1_5SHOTS.ipynb		MMLU_TEST_QWEN2_5B_LLAMA3B_1_5SHOTS.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm_evals

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm_evals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages