Skip to content

Commit d5eea2e

Browse files
committed
Adding configuration files and documentation
1 parent a45593d commit d5eea2e

38 files changed

Lines changed: 7898 additions & 125 deletions

.claude/settings.json

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,4 @@
11
{
22
"permissions": {
3-
"allow": [
4-
"Bash(*)",
5-
"Edit",
6-
"MultiEdit",
7-
"NotebookEdit",
8-
"FileEdit",
9-
"WebFetch",
10-
"WebSearch",
11-
"Write"
12-
]
133
}
144
}

CLAUDE.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,29 +42,41 @@ NEVER required, if you think you need them, it's likely a bad smell that your lo
4242
## Project Architecture
4343

4444
### Core Structure
45-
- **src/my_awesome_tool/** - Main package containing the CLI and application logic
46-
- `cli.py` - Typer-based CLI interface, entry point for the application
45+
- **src/ai_blame/** - Main package
46+
- `cli.py` - Typer-based CLI interface, entry point (`ai-blame` command)
47+
- `extractor.py` - Logic for extracting provenance from Claude Code trace files (JSONL)
48+
- `models.py` - Data models for curation history entries
49+
- `updater.py` - Logic for updating YAML files with curation history
4750
- **tests/** - Test suite using pytest with parametrized tests
4851
- **docs/** - MkDocs-managed documentation with Material theme
4952

53+
### What the Tool Does
54+
1. Scans Claude Code trace files (`~/.claude/projects/<encoded-cwd>/`) in JSONL format
55+
2. Identifies successful `Edit` and `Write` tool operations
56+
3. Extracts metadata: timestamp, model, file path
57+
4. Groups by file and filters (first+last, size thresholds)
58+
5. Appends `edit_history` sections to affected YAML files
59+
5060
### Technology Stack
5161
- **Python 3.10+** with `uv` for dependency management
52-
- **LinkML** for data modeling (linkml-runtime)
5362
- **Typer** for CLI interface
63+
- **PyYAML** for YAML file manipulation
5464
- **pytest** for testing
5565
- **mypy** for type checking
5666
- **ruff** for linting and formatting
5767
- **MkDocs Material** for documentation
68+
- **LinkML** (dev dependency) for data modeling
5869

5970
### Key Configuration Files
6071
- `pyproject.toml` - Python project configuration, dependencies, and tool settings
6172
- `justfile` - Command runner recipes for common development tasks
73+
- `project.justfile` - Project-specific recipes (imported by main justfile)
6274
- `mkdocs.yml` - Documentation configuration
6375
- `uv.lock` - Locked dependency versions
6476

6577
## Development Workflow
6678

6779
1. Dependencies are managed via `uv` - use `uv add` for new dependencies
6880
2. All commands are run through `just` or `uv run`
69-
3. The project uses dynamic versioning from git tags
70-
4. Documentation is auto-deployed to GitHub Pages at https://monarch-initiative.github.io/my-awesome-tool
81+
3. The project uses dynamic versioning from git tags (uv-dynamic-versioning)
82+
4. GitHub repo: https://github.com/ai4curation/ai-blame

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
<p align="center">
2+
<img src="docs/assets/ai-blame-logo.png" alt="ai-blame logo" width="200">
3+
</p>
4+
15
# ai-blame
26

37
Extract provenance/audit trails from AI agent execution traces.
@@ -33,7 +37,7 @@ uv sync
3337
# Show stats about available traces
3438
ai-blame stats
3539

36-
# Dry run - preview what curation_history would be added
40+
# Dry run - preview what edit_history would be added
3741
ai-blame mine --initial-and-recent
3842

3943
# Actually apply changes to files
@@ -45,12 +49,12 @@ ai-blame mine Asthma.yaml --initial-and-recent
4549

4650
## Output Format
4751

48-
Appends a `curation_history` section to YAML files:
52+
Appends a `edit_history` section to YAML files:
4953

5054
```yaml
5155
# ... existing content ...
5256

53-
curation_history:
57+
edit_history:
5458
- timestamp: "2025-12-01T08:03:42Z"
5559
model: claude-opus-4-5-20251101
5660
action: CREATED
@@ -65,7 +69,7 @@ curation_history:
6569
2. Identifies successful `Edit` and `Write` tool operations
6670
3. Extracts metadata: timestamp, model, file path
6771
4. Groups by file and filters (first+last, size thresholds)
68-
5. Appends `curation_history` to affected files
72+
5. Appends `edit_history` to affected files
6973

7074
## Trace Directory Detection
7175

docs/assets/ai-blame-logo.png

782 KB
Loading

docs/explanation/how-it-works.md

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# How It Works
2+
3+
This document explains the internal workings of `ai-blame`.
4+
5+
## Overview
6+
7+
```mermaid
8+
flowchart LR
9+
A[Claude Code Traces] --> B[Extractor]
10+
B --> C[Edit Records]
11+
C --> D[Filters]
12+
D --> E[File Histories]
13+
E --> F[Updater]
14+
F --> G[Modified Files]
15+
```
16+
17+
## Step 1: Locate Trace Files
18+
19+
Claude Code stores execution traces in:
20+
21+
```
22+
~/.claude/projects/<encoded-cwd>/
23+
```
24+
25+
Where `<encoded-cwd>` is the project path with `/` replaced by `-`.
26+
27+
**Example:**
28+
29+
| Project Path | Trace Directory |
30+
|--------------|-----------------|
31+
| `/Users/alice/myproject` | `~/.claude/projects/-Users-alice-myproject/` |
32+
| `/home/bob/work/repo` | `~/.claude/projects/-home-bob-work-repo/` |
33+
34+
Each session generates a JSONL file (e.g., `a1b2c3d4-5678-90ab-cdef.jsonl`).
35+
36+
## Step 2: Parse Trace Files
37+
38+
Each trace file contains a sequence of JSON records representing the conversation and tool usage:
39+
40+
```json
41+
{"type": "user", "uuid": "msg-1", "message": {...}}
42+
{"type": "assistant", "uuid": "msg-2", "message": {...}, "model": "claude-opus-4-5"}
43+
{"type": "user", "uuid": "msg-3", "toolUseResult": {...}}
44+
```
45+
46+
The extractor looks for **successful Edit/Write operations** by finding `toolUseResult` entries with:
47+
48+
- A `filePath` field
49+
- Either `structuredPatch` (for edits) or `type: create` (for new files)
50+
- No error indicators
51+
52+
```python
53+
def is_successful_edit(record: dict) -> bool:
54+
if record.get("type") != "user":
55+
return False
56+
57+
tool_result = record.get("toolUseResult")
58+
if not tool_result or not isinstance(tool_result, dict):
59+
return False
60+
61+
if tool_result.get("is_error") or tool_result.get("error"):
62+
return False
63+
64+
file_path = tool_result.get("filePath", "")
65+
if not file_path:
66+
return False
67+
68+
has_patch = "structuredPatch" in tool_result
69+
is_create = tool_result.get("type") == "create"
70+
71+
return has_patch or is_create
72+
```
73+
74+
## Step 3: Extract Metadata
75+
76+
For each successful edit, we extract:
77+
78+
| Field | Source |
79+
|-------|--------|
80+
| `file_path` | `toolUseResult.filePath` |
81+
| `timestamp` | Record's `timestamp` field |
82+
| `model` | Parent message's `message.model` |
83+
| `session_id` | Record's `sessionId` |
84+
| `is_create` | `toolUseResult.type == "create"` |
85+
| `change_size` | Calculated from content/patch |
86+
| `agent_version` | Record's `version` field |
87+
88+
The **model** is found by looking up the parent message (the assistant message that invoked the tool).
89+
90+
## Step 4: Apply Filters
91+
92+
Filters reduce the volume of edit records:
93+
94+
### File Pattern Filter
95+
96+
Keeps only files matching a substring:
97+
98+
```python
99+
if file_pattern and file_pattern not in file_path:
100+
continue # Skip this record
101+
```
102+
103+
### Time Range Filters
104+
105+
```python
106+
if config.since and edit.timestamp < config.since:
107+
continue
108+
if config.until and edit.timestamp > config.until:
109+
continue
110+
```
111+
112+
### Size Filter
113+
114+
Skip small edits (likely typo fixes):
115+
116+
```python
117+
if config.min_change_size > 0:
118+
edits = [e for e in edits if e.change_size >= config.min_change_size]
119+
```
120+
121+
### Initial and Recent Only
122+
123+
Keep only the first and last edit per file:
124+
125+
```python
126+
if config.initial_and_recent_only and len(edits) > 2:
127+
edits = [edits[0], edits[-1]]
128+
```
129+
130+
## Step 5: Convert to File Histories
131+
132+
Edit records are grouped by file and converted to `FileHistory` objects:
133+
134+
```python
135+
for abs_path, edits in edits_by_file.items():
136+
rel_path = normalize_path(abs_path, repo_root)
137+
138+
events = []
139+
for i, edit in enumerate(edits):
140+
action = CurationAction.CREATED if (i == 0 and edit.is_create) else CurationAction.EDITED
141+
142+
events.append(CurationEvent(
143+
timestamp=edit.timestamp,
144+
model=edit.model,
145+
action=action,
146+
agent_tool=edit.agent_tool,
147+
agent_version=edit.agent_version,
148+
))
149+
150+
histories[rel_path] = FileHistory(file_path=rel_path, events=events)
151+
```
152+
153+
## Step 6: Load Output Configuration
154+
155+
The configuration determines how each file type is handled:
156+
157+
```python
158+
# Auto-find .ai-blame.yaml
159+
config_path = find_config()
160+
if config_path:
161+
output_config = load_config(config_path)
162+
else:
163+
output_config = get_default_config()
164+
```
165+
166+
Rules are matched using glob patterns:
167+
168+
```python
169+
def get_rule_for_file(self, path: str) -> FileRule | None:
170+
filename = Path(path).name
171+
172+
for rule in self.rules:
173+
if "/" in rule.pattern or "**" in rule.pattern:
174+
if fnmatch(path, rule.pattern):
175+
return rule
176+
else:
177+
if fnmatch(filename, rule.pattern):
178+
return rule
179+
180+
return self.defaults
181+
```
182+
183+
## Step 7: Apply Changes
184+
185+
Based on the output policy:
186+
187+
### Append Policy
188+
189+
For YAML files, append a `edit_history` section:
190+
191+
```python
192+
curation_yaml = generate_curation_yaml(history)
193+
new_content = content + "\n" + curation_yaml
194+
file_path.write_text(new_content)
195+
```
196+
197+
If `edit_history` already exists, it's replaced.
198+
199+
### Sidecar Policy
200+
201+
Write to a companion file:
202+
203+
```python
204+
sidecar_path = resolve_sidecar_path(file_path, pattern)
205+
# e.g., main.py → main.history.yaml
206+
207+
sidecar_data = {
208+
"source_file": file_path.name,
209+
"edit_history": events,
210+
}
211+
sidecar_path.write_text(yaml.dump(sidecar_data))
212+
```
213+
214+
Existing sidecar files are merged (deduplicated by timestamp).
215+
216+
### Comment Policy
217+
218+
Embed as a comment block:
219+
220+
```python
221+
if syntax == CommentSyntax.HASH:
222+
marker_start = "# --- edit_history ---"
223+
marker_end = "# --- end edit_history ---"
224+
commented = "\n".join(f"# {line}" for line in history_yaml.split("\n"))
225+
226+
new_content = content + "\n" + comment_block
227+
```
228+
229+
## Data Flow Summary
230+
231+
```mermaid
232+
flowchart TB
233+
subgraph Input
234+
T[Trace Files<br>~/.claude/projects/...]
235+
C[Config File<br>.ai-blame.yaml]
236+
end
237+
238+
subgraph Processing
239+
E[Extractor<br>parse_trace_file]
240+
F[Filter<br>apply_filters]
241+
H[Converter<br>convert_to_file_histories]
242+
end
243+
244+
subgraph Output
245+
A[Append<br>YAML/JSON files]
246+
S[Sidecar<br>*.history.yaml]
247+
M[Comment<br>Code files]
248+
end
249+
250+
T --> E
251+
E --> F
252+
F --> H
253+
C --> R[Rule Matcher]
254+
H --> R
255+
R -->|append| A
256+
R -->|sidecar| S
257+
R -->|comment| M
258+
```

0 commit comments

Comments
 (0)