This repository contains a collection of Colab notebooks used to evaluate In Context Learning with various Large Language Models (LLMs) on the MMLU benchmark and other various benchmarks. The output from colab is cleared for clarity.
These evaluations leverage the EleutherAI lm-evaluation-harness library.
Notable Achievements:
(MMLU)
Google Gemma-3-4b-it: Demonstrated the highest sensitivity to in-context learning, jumping from a 51% score (0-shot) to 65% (5-shot).
Llama-3.2-3B-Instruct: Showed rapid adaptation, improving from 59% (0-shot) to approximately 65% with just 1-shot.