Skip to content

JonathanRaines/bluedot-intro-to-mech-interp

Repository files navigation

BlueDot AI Safety Fundamentals Alignment Course Project

This project was completed for the summer 2024 cohort of the BlueDot AI Safety Fundamentals Alignment Course.

Introduction to Mechanistic Interpretability

For my project, I followed Neel Nanda's walkthrough of his paper "Progress Measures for Grokking via Mechanistic Interpretability"1

Footnotes

  1. N. Nanda, L. Chan, T. Lieberum, J. Smith, and J. Steinhardt, ‘Progress measures for grokking via mechanistic interpretability’. arXiv, Oct. 19, 2023. doi: 10.48550/arXiv.2301.05217.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published