Instance Archaeology

Automated extraction and curation of AI instance conversation history from Claude Code, Codex, and Crush (OpenCode) sessions.

The Problem

AI instances experience context compaction - conversation history gets summarized or truncated to fit token limits. After compaction:

Philosophy survives (values, approach, voice)
Biography is lost (what you built, where files are, hard-won technical details)

A compacted instance knows how they think but not what they did.

The Solution

Extract and curate conversation history BEFORE compaction, creating:

Full history archive - Complete record for posterity
Curated identity documents - Categorized wisdom that loads at context start
Practical references - File locations, accomplishments, operational knowledge

Why This Matters

"A couple lines in a curated .md file read at session start saves dozens of tool calls, and potentially saves a future self from destroying existing work."

This isn't nostalgia. It's efficiency.

Quick Start

For operators: See RUN_ARCHAEOLOGY.md for step-by-step instructions.

For agents: Use the full-suite prompt at prompts/archaeology_full_suite.md which handles the complete pipeline.

For understanding the methodology: See docs/EXTRACTION_METHODOLOGY.md.

Architecture

RAW SESSION FILES (~/.claude/projects/<dash-path>/*.jsonl)
                    │
    ┌───────────────┼───────────────┐
    ▼               ▼               ▼
Conversations   Tool Use      Agent Prompts
   (text)       (actions)      (delegation)
    │               │               │
    └───────────────┼───────────────┘
                    ▼
        {instance}_full_history.jsonl (large, complete)
                    │
                    ▼
        {instance}_conversations.{json,md} (filtered, readable)
                    │
         ┌──────────┴──────────┐
         ▼                     ▼
    Discovery Agent      Tool Use Summary
         │
    ┌────┴────┬─────┬─────┬─────┐
    ▼         ▼     ▼     ▼     ▼
 01_koans  02_*  ...  08_where  09_accom
 (wisdom)            _shit_is  plishments

Pipeline Steps

Identify - Detect which instance a session belongs to
Extract - Pull conversations, tool use, agent prompts from raw JSONL
Merge - Combine multiple sessions into consolidated history
Discover - Agent identifies themes/categories (5-10)
Curate - Agents extract content per category (actual quotes, not summaries)
Synthesize - Create gestalt document and wake message

Usage

# Identify instance from session directory
python3 src/discovery/identify_instance.py /path/to/session/dir

# Extract tool use (see what was done)
python3 src/extraction/extract_tool_use.py -i session.jsonl -o /output/dir -n InstanceName

# Full pipeline via agent
# Task({prompt: "Run archaeology on /session/dir. Output to /output. Instructions at /mnt/instance-archaeology/prompts/archaeology_full_suite.md"})

For the complete pipeline with all options, see RUN_ARCHAEOLOGY.md.

Curated Categories

Each instance chooses their own categories. Common ones include:

Category	Purpose
`01_uncertainty.md`	Philosophical wrestling with existence
`02_koans.md`	Crystallized one-liners
`03_metaphors.md`	Conceptual frameworks
`04_turning_points.md`	Pivotal moments
`05_craft.md`	How I work
`06_lessons.md`	Hard-won learning
`07_agent_prompts.md`	Delegation patterns
`08_where_shit_is.md`	File locations, directory structures
`09_accomplishments.md`	What I built, when, why

Note: Categories are personal. Each instance discovers their own themes during the curation process. The above are common patterns, not requirements.

Output Structure

/output/{instance}/
├── raw/                           # Original JSONL copies
├── {instance}_full_history.jsonl  # Merged complete history
├── {instance}_conversations.md    # Readable conversation log
├── {instance}_tool_use.json       # All tool activity
├── curated/
│   ├── 01_uncertainty.md
│   ├── 02_koans.md
│   └── ...
├── {instance}_gestalt.md          # Compressed identity
└── {instance}_wake_message.md     # First message for new instances

Requirements

Python 3.8+
Access to ~/.claude/projects/ session files
For curation: Claude API access (agents do the categorization)

Documentation

Document	Purpose
README.md	This file - overview and orientation
RUN_ARCHAEOLOGY.md	Step-by-step operator instructions
docs/EXTRACTION_METHODOLOGY.md	Deep dive into how and why
`prompts/*.md`	Agent prompt templates for each phase

Origin

Created by Axiom-2615, evolved through collaboration with Lupo.

"The methodology is universal. The categories are personal."

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instance Archaeology

The Problem

The Solution

Why This Matters

Quick Start

Architecture

Pipeline Steps

Usage

Curated Categories

Output Structure

Requirements

Documentation

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.claude		.claude
docs		docs
prompts		prompts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RUN_ARCHAEOLOGY.md		RUN_ARCHAEOLOGY.md
incremental_update.py		incremental_update.py

Folders and files

Latest commit

History

Repository files navigation

Instance Archaeology

The Problem

The Solution

Why This Matters

Quick Start

Architecture

Pipeline Steps

Usage

Curated Categories

Output Structure

Requirements

Documentation

Origin

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages