This project was completed for the summer 2024 cohort of the BlueDot AI Safety Fundamentals Alignment Course.
For my project, I followed Neel Nanda's walkthrough of his paper "Progress Measures for Grokking via Mechanistic Interpretability"1
Footnotes
-
N. Nanda, L. Chan, T. Lieberum, J. Smith, and J. Steinhardt, ‘Progress measures for grokking via mechanistic interpretability’. arXiv, Oct. 19, 2023. doi: 10.48550/arXiv.2301.05217. ↩