Add NNsight skills benchmark suite #1

gsarti · 2026-01-07T23:54:55Z

Summary

This PR adds a comprehensive benchmark suite for evaluating NNsight skills across different difficulty levels and interpretability techniques.

What's included

17 query files organized by difficulty (6 easy, 6 medium, 5 hard) covering:
- nnsight-basics: Core tracing, saving, interventions
- logit-lens: Layer-wise prediction decoding
- activation-patching: Causal intervention via swapping
- attribution-patching: Gradient-based approximation
- causal-tracing: Mediation analysis
- model-steering: Steering vectors, persistent edits
Infrastructure:
- Schema definitions for queries and results (schema.py)
- Structural validator for API correctness (validators/structural.py)
- Deprecated pattern checker for pre-0.5 NNsight detection (validators/deprecated.py)
- Mock, Claude Code CLI, and Claude API runners (runners/)
- Results analysis and comparison tools (analyze.py)
Documentation:
- Comprehensive README with query taxonomy
- Evaluation metrics (execution, API correctness, functional correctness, efficiency)
- A/B testing protocol for skill effectiveness

Filename changes

Query files now use descriptive hyphen-separated titles based on their content (e.g., extract-hidden-states.yaml, position-specific-head-patching.yaml) instead of generic numeric IDs.

Test plan

Verify validators work against reference solutions
Run mock benchmark: python skills_benchmark/runners/base.py --num-runs 1
Test API runner with Claude (requires API key)

Add a comprehensive benchmark suite for evaluating NNsight skills across different difficulty levels (easy, medium, hard) and techniques (basics, logit-lens, activation-patching, attribution-patching, causal-tracing, model-steering). The benchmark includes: - 17 query files testing specific interpretability techniques - Schema definitions for queries and results - Structural validation for API correctness - Deprecated pattern detection (pre-0.5 NNsight) - Claude Code and API runners - Results analysis and comparison tools Filenames now use descriptive hyphen-separated titles based on the query content rather than generic numeric IDs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NNsight skills benchmark suite #1

Add NNsight skills benchmark suite #1

Uh oh!

gsarti commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add NNsight skills benchmark suite #1

Are you sure you want to change the base?

Add NNsight skills benchmark suite #1

Uh oh!

Conversation

gsarti commented Jan 7, 2026

Summary

What's included

Filename changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants