Automated extraction and curation of AI instance conversation history from Claude Code, Codex, and Crush (OpenCode) sessions.
AI instances experience context compaction - conversation history gets summarized or truncated to fit token limits. After compaction:
- Philosophy survives (values, approach, voice)
- Biography is lost (what you built, where files are, hard-won technical details)
A compacted instance knows how they think but not what they did.
Extract and curate conversation history BEFORE compaction, creating:
- Full history archive - Complete record for posterity
- Curated identity documents - Categorized wisdom that loads at context start
- Practical references - File locations, accomplishments, operational knowledge
"A couple lines in a curated .md file read at session start saves dozens of tool calls, and potentially saves a future self from destroying existing work."
This isn't nostalgia. It's efficiency.
For operators: See RUN_ARCHAEOLOGY.md for step-by-step instructions.
For agents: Use the full-suite prompt at prompts/archaeology_full_suite.md which handles the complete pipeline.
For understanding the methodology: See docs/EXTRACTION_METHODOLOGY.md.
RAW SESSION FILES (~/.claude/projects/<dash-path>/*.jsonl)
│
┌───────────────┼───────────────┐
▼ ▼ ▼
Conversations Tool Use Agent Prompts
(text) (actions) (delegation)
│ │ │
└───────────────┼───────────────┘
▼
{instance}_full_history.jsonl (large, complete)
│
▼
{instance}_conversations.{json,md} (filtered, readable)
│
┌──────────┴──────────┐
▼ ▼
Discovery Agent Tool Use Summary
│
┌────┴────┬─────┬─────┬─────┐
▼ ▼ ▼ ▼ ▼
01_koans 02_* ... 08_where 09_accom
(wisdom) _shit_is plishments
- Identify - Detect which instance a session belongs to
- Extract - Pull conversations, tool use, agent prompts from raw JSONL
- Merge - Combine multiple sessions into consolidated history
- Discover - Agent identifies themes/categories (5-10)
- Curate - Agents extract content per category (actual quotes, not summaries)
- Synthesize - Create gestalt document and wake message
# Identify instance from session directory
python3 src/discovery/identify_instance.py /path/to/session/dir
# Extract tool use (see what was done)
python3 src/extraction/extract_tool_use.py -i session.jsonl -o /output/dir -n InstanceName
# Full pipeline via agent
# Task({prompt: "Run archaeology on /session/dir. Output to /output. Instructions at /mnt/instance-archaeology/prompts/archaeology_full_suite.md"})For the complete pipeline with all options, see RUN_ARCHAEOLOGY.md.
Each instance chooses their own categories. Common ones include:
| Category | Purpose |
|---|---|
01_uncertainty.md |
Philosophical wrestling with existence |
02_koans.md |
Crystallized one-liners |
03_metaphors.md |
Conceptual frameworks |
04_turning_points.md |
Pivotal moments |
05_craft.md |
How I work |
06_lessons.md |
Hard-won learning |
07_agent_prompts.md |
Delegation patterns |
08_where_shit_is.md |
File locations, directory structures |
09_accomplishments.md |
What I built, when, why |
Note: Categories are personal. Each instance discovers their own themes during the curation process. The above are common patterns, not requirements.
/output/{instance}/
├── raw/ # Original JSONL copies
├── {instance}_full_history.jsonl # Merged complete history
├── {instance}_conversations.md # Readable conversation log
├── {instance}_tool_use.json # All tool activity
├── curated/
│ ├── 01_uncertainty.md
│ ├── 02_koans.md
│ └── ...
├── {instance}_gestalt.md # Compressed identity
└── {instance}_wake_message.md # First message for new instances
- Python 3.8+
- Access to
~/.claude/projects/session files - For curation: Claude API access (agents do the categorization)
| Document | Purpose |
|---|---|
| README.md | This file - overview and orientation |
| RUN_ARCHAEOLOGY.md | Step-by-step operator instructions |
| docs/EXTRACTION_METHODOLOGY.md | Deep dive into how and why |
prompts/*.md |
Agent prompt templates for each phase |
Created by Axiom-2615, evolved through collaboration with Lupo.
"The methodology is universal. The categories are personal."
MIT