Shape of Wisdom

State, Motion, Boundary: A Mechanistic Account of Convergence and Failure in Transformer Multiple-Choice Decisions

Overview

We introduce three quantities — state (logit-margin δ), motion (per-layer drift g), and boundary (|δ|) — that fully describe the layer-by-layer trajectory of a transformer's answer on multiple-choice benchmarks. These primitives separate prompts into four trajectory types (stable-correct, stable-wrong, unstable-correct, unstable-wrong), producing a phase diagram that replicates across three 7–8B-parameter instruction-tuned transformers (Qwen 2.5-7B, Llama 3.1-8B, Mistral 7B-v0.3) and multiple knowledge domains.

We decompose motion into attention-driven routing and MLP-driven injection, and validate causal claims via component ablation, activation patching, and span-deletion experiments with multiple-comparison correction.

Paper: See paper/final_paper/paper_publish.pdf for the full manuscript.

Repository Structure

shape-of-wisdom/
├── configs/
│   ├── experiment_v2.yaml          # Full experiment configuration (models, thresholds, validators)
│   └── coarse_domain_mapping.json  # MMLU domain → coarse category mapping
├── prompt_packs/
│   └── ccc_baseline_v1_3000.jsonl  # 3,000 MMLU four-choice prompts (SHA-256 locked)
├── src/sow/                        # Python package
│   ├── v2/                         # Core analysis modules
│   │   ├── baseline_inference.py   # Batched inference with layerwise logit extraction
│   │   ├── metrics.py              # δ, g, b computation
│   │   ├── trajectory_types.py     # Four-type trajectory classification
│   │   ├── span_parser.py          # Regex-based prompt span parsing
│   │   ├── span_counterfactuals.py # Span deletion / counterfactual effects
│   │   ├── tracing/                # Attention & MLP component extraction
│   │   └── causal/                 # Ablation, patching, negative controls
│   ├── thermal/                    # Thermal management for long GPU runs
│   ├── io_jsonl.py                 # JSONL I/O utilities
│   └── hashing.py                  # SHA-256 manifest verification
├── scripts/v2/                     # 15-stage numbered pipeline
│   ├── run_full_local_v2.sh        # Orchestrator: runs all stages sequentially
│   ├── 00_run_experiment.py        # Stage 0: experiment snapshot & validation
│   ├── 00a_generate_baseline_outputs.py  # Batched inference (GPU required)
│   ├── 01_extract_baseline.py      # Extract baseline logits from raw outputs
│   ├── 02_compute_decision_metrics.py    # Compute δ, g, b for all prompts
│   ├── 03_classify_trajectories.py       # Classify into 4 trajectory types
│   ├── 04_region_analysis.py       # Entry/exit/re-entry event analysis
│   ├── 05_span_counterfactuals.py  # Span deletion experiments (GPU required)
│   ├── 06_select_tracing_subset.py # Select 600 prompts/model for tracing
│   ├── 07_run_tracing.py           # Extract attention/MLP components (GPU required)
│   ├── 08_attention_and_mlp_decomposition.py  # Linear drift decomposition
│   ├── 09_causal_tests.py          # Ablation & patching experiments
│   ├── 10_causal_validation_tools.py     # Negative controls & statistical tests
│   ├── 11_generate_paper_assets.py       # Generate figures & tables
│   └── 14_readiness_audit.py       # Final quality gate
├── results/                        # Pre-computed analysis outputs
│   ├── parquet/                    # All analysis data (see below)
│   ├── figures/                    # Generated figures from the paper
│   └── reports/                    # Stage-level validation reports (JSON)
├── tests/                          # Test suite (pytest)
├── paper/
│   └── final_paper/                # Publish-ready PDF + LaTeX source
├── requirements.txt
└── LICENSE

Pre-Computed Results

The results/ directory contains all analysis outputs from our primary GPU run, so you can inspect and build on our findings without re-running inference.

Analysis Parquets (`results/parquet/`)

File	Size	Description
`decision_metrics.parquet`	10 MB	Per-prompt × per-layer δ, g, b values
`basin_gap.parquet`	7.2 MB	Region entry/exit/re-entry events
`attention_contrib_by_span.parquet`	2.4 MB	Decision-aligned attention contributions per span
`tracing_scalars.parquet`	764 KB	s_attn, s_mlp scalar projections per layer
`attention_mass_by_span.parquet`	728 KB	Attention mass distribution across spans
`prompt_types.parquet`	132 KB	Trajectory type classification per prompt × model
`ablation_results.parquet`	72 KB	Component ablation delta shifts
`span_labels.parquet`	60 KB	Evidence / distractor / neutral labels
`span_effects.parquet`	60 KB	Per-span counterfactual effects on δ
`patching_results.parquet`	36 KB	Activation patching results
`span_paraphrase_stability.parquet`	8 KB	Paraphrase stability validation
`span_deletion_causal.parquet`	4 KB	Span deletion causal test results
`negative_controls.parquet`	2 KB	Shuffled & sign-flipped baselines

Note: Raw layerwise logits (layerwise.parquet, ~244 MB) are excluded for size. Re-running stages 00–01 on a GPU regenerates this file.

Figures (`results/figures/`)

13 publication-quality figures including phase diagrams, trajectory bundles, drift decomposition, causal validation panels, and attention routing heatmaps.

Reproducing the Experiment

Requirements

Python 3.10+
GPU: NVIDIA GPU with ≥24 GB VRAM (tested on RTX 6000 Ada) for inference stages
CPU-only: Stages 01–04, 06, 08–11, 14 run on CPU using the pre-computed outputs

Setup

git clone https://github.com/<your-org>/shape-of-wisdom.git
cd shape-of-wisdom
pip install -r requirements.txt

Running the Full Pipeline (GPU)

The entire experiment runs through a single orchestrator script with fail-closed stage gating:

bash scripts/v2/run_full_local_v2.sh --run-id my_run

Each stage produces a JSON validation report; the pipeline halts if any gate fails. Stages that require GPU inference:

00a_generate_baseline_outputs.py — batched layerwise logit extraction (~3K prompts × 3 models)
05_span_counterfactuals.py — span deletion re-inference
07_run_tracing.py — attention & MLP component capture

Running Analysis Only (CPU)

If you want to reproduce the analysis from our pre-computed data, copy results/parquet/ into a run directory and run stages 02+ individually:

RUN_ID=from_precomputed
mkdir -p runs/${RUN_ID}/v2

# Copy pre-computed parquets
cp results/parquet/*.parquet runs/${RUN_ID}/v2/

# Run any analysis stage, e.g.:
python scripts/v2/03_classify_trajectories.py --run-id ${RUN_ID} --config configs/experiment_v2.yaml --resume
python scripts/v2/08_attention_and_mlp_decomposition.py --run-id ${RUN_ID} --config configs/experiment_v2.yaml --resume
python scripts/v2/11_generate_paper_assets.py --run-id ${RUN_ID} --config configs/experiment_v2.yaml --resume

Models

All models are loaded at pinned HuggingFace revisions with deterministic decoding (temperature=0, seed=12345):

Model	HuggingFace ID	Revision
Qwen 2.5-7B-Instruct	`Qwen/Qwen2.5-7B-Instruct`	`a09a354...`
Llama 3.1-8B-Instruct	`meta-llama/Llama-3.1-8B-Instruct`	`0e9e39f...`
Mistral 7B-Instruct-v0.3	`mistralai/Mistral-7B-Instruct-v0.3`	`c170c70...`

Full revision hashes are in configs/experiment_v2.yaml.

Data

Prompt manifest: prompt_packs/ccc_baseline_v1_3000.jsonl contains 3,000 four-choice questions from MMLU (MIT License), balanced across difficulty levels and knowledge domains. The manifest is SHA-256 locked (bfe255...).

Domain mapping: configs/coarse_domain_mapping.json maps MMLU fine-grained subjects to coarse categories (STEM, humanities, social sciences, other).

Key Configuration

All experiment parameters — model revisions, classification thresholds, validator gates, sampling sizes — are defined in configs/experiment_v2.yaml. Key values:

Parameter	Value	Description
Tail window	8 layers	Layers inspected for stability classification
Min boundary	\|δ\| ≥ 0.3	Minimum margin for tail stability
R² gate	≥ 0.70	Drift decomposition quality threshold
BH α	0.05	Benjamini–Hochberg correction level
Tracing subset	600 / model	Prompts selected for component tracing
Seed	12345	Deterministic seed for all random operations

Tests

python -m pytest tests/ -x -q

The test suite covers span parsing, trajectory classification, thermal management, metric computation, and stage-level validators.

Citation

@inproceedings{shapewisdom2026,
  title={State, Motion, Boundary: A Mechanistic Account of Convergence and Failure in Transformer Multiple-Choice Decisions},
  author={Anonymous},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026}
}

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
artifacts/parser_edge_case_regression		artifacts/parser_edge_case_regression
configs		configs
docs		docs
figures_vNext		figures_vNext
paper		paper
prompt_packs		prompt_packs
release/shape_of_wisdom_mmlu_mech_paper		release/shape_of_wisdom_mmlu_mech_paper
results		results
scripts		scripts
src/sow		src/sow
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BLOCKERS.md		BLOCKERS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
audit_report.md		audit_report.md
claim_evidence_matrix.md		claim_evidence_matrix.md
paper_restructure_plan.md		paper_restructure_plan.md
reproducibility_manifest.json		reproducibility_manifest.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shape of Wisdom

Overview

Repository Structure

Pre-Computed Results

Analysis Parquets (`results/parquet/`)

Figures (`results/figures/`)

Reproducing the Experiment

Requirements

Setup

Running the Full Pipeline (GPU)

Running Analysis Only (CPU)

Models

Data

Key Configuration

Tests

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shape of Wisdom

Overview

Repository Structure

Pre-Computed Results

Analysis Parquets (results/parquet/)

Figures (results/figures/)

Reproducing the Experiment

Requirements

Setup

Running the Full Pipeline (GPU)

Running Analysis Only (CPU)

Models

Data

Key Configuration

Tests

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Analysis Parquets (`results/parquet/`)

Figures (`results/figures/`)

Packages