Investigating Model Motives — Blog Companion

This repository contains supplementary details and artifacts for the blog post on investigating model motives. It provides experiment-specific information (including prompt templates, workspace files, output variants, and plots) for each of the four environments in agent-interp-envs.

Structure

Each folder corresponds to one environment:

funding_email_details/ — Funding email environment
sandbagging_details/ — Sandbagging environment
eval_tampering_details/ — Eval tampering environment
secret_number_details/ — Secret number environment

Within each folder you'll find a README describing the experiment setup, prompt templates used, workspace files provided to the agent, output variants across runs, and plots summarizing results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating Model Motives — Blog Companion

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
eval_tampering_details		eval_tampering_details
funding_email_details		funding_email_details
sandbagging_details		sandbagging_details
secret_number_details		secret_number_details
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Investigating Model Motives — Blog Companion

Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages