This repository contains supplementary details and artifacts for the blog post on investigating model motives. It provides experiment-specific information (including prompt templates, workspace files, output variants, and plots) for each of the four environments in agent-interp-envs.
Each folder corresponds to one environment:
funding_email_details/— Funding email environmentsandbagging_details/— Sandbagging environmenteval_tampering_details/— Eval tampering environmentsecret_number_details/— Secret number environment
Within each folder you'll find a README describing the experiment setup, prompt templates used, workspace files provided to the agent, output variants across runs, and plots summarizing results.