ai4curation
diff --git a/‎.claude/settings.json‎
Lines changed: 0 additions & 10 deletions b/‎.claude/settings.json‎
Lines changed: 0 additions & 10 deletions
diff --git a/‎.github/workflows/main.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/main.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CLAUDE.md‎
Lines changed: 17 additions & 5 deletions b/‎CLAUDE.md‎
Lines changed: 17 additions & 5 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 78 additions & 88 deletions b/‎CONTRIBUTING.md‎
Lines changed: 78 additions & 88 deletions
@@ -1,14 +1,4 @@
 {
   "permissions": {
-    "allow": [
-        "Bash(*)",
-        "Edit",
-        "MultiEdit",
-        "NotebookEdit",
-        "FileEdit",
-        "WebFetch",
-        "WebSearch",
-        "Write"
-    ]
   }
 }
@@ -19,7 +19,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ["3.10", "3.11", "3.12", "3.13"]
+        python-version: ["3.10", "3.13"]
       fail-fast: false
 
     steps:
 
@@ -42,29 +42,41 @@ NEVER required, if you think you need them, it's likely a bad smell that your lo
 ## Project Architecture
 
 ### Core Structure
-- **src/my_awesome_tool/** - Main package containing the CLI and application logic
-  - `cli.py` - Typer-based CLI interface, entry point for the application
+- **src/ai_blame/** - Main package
+  - `cli.py` - Typer-based CLI interface, entry point (`ai-blame` command)
+  - `extractor.py` - Logic for extracting provenance from Claude Code trace files (JSONL)
+  - `models.py` - Data models for curation history entries
+  - `updater.py` - Logic for updating YAML files with curation history
 - **tests/** - Test suite using pytest with parametrized tests
 - **docs/** - MkDocs-managed documentation with Material theme
 
+### What the Tool Does
+1. Scans Claude Code trace files (`~/.claude/projects/<encoded-cwd>/`) in JSONL format
+2. Identifies successful `Edit` and `Write` tool operations
+3. Extracts metadata: timestamp, model, file path
+4. Groups by file and filters (first+last, size thresholds)
+5. Appends `edit_history` sections to affected YAML files
+
 ### Technology Stack
 - **Python 3.10+** with `uv` for dependency management
-- **LinkML** for data modeling (linkml-runtime)
 - **Typer** for CLI interface
+- **PyYAML** for YAML file manipulation
 - **pytest** for testing
 - **mypy** for type checking
 - **ruff** for linting and formatting
 - **MkDocs Material** for documentation
+- **LinkML** (dev dependency) for data modeling
 
 ### Key Configuration Files
 - `pyproject.toml` - Python project configuration, dependencies, and tool settings
 - `justfile` - Command runner recipes for common development tasks
+- `project.justfile` - Project-specific recipes (imported by main justfile)
 - `mkdocs.yml` - Documentation configuration
 - `uv.lock` - Locked dependency versions
 
 ## Development Workflow
 
 1. Dependencies are managed via `uv` - use `uv add` for new dependencies
 2. All commands are run through `just` or `uv run`
-3. The project uses dynamic versioning from git tags
-4. Documentation is auto-deployed to GitHub Pages at https://monarch-initiative.github.io/my-awesome-tool
+3. The project uses dynamic versioning from git tags (uv-dynamic-versioning)
+4. GitHub repo: https://github.com/ai4curation/ai-blame
@@ -1,121 +1,111 @@
 # Contributing to ai-blame
 
-:+1: First of all: Thank you for taking the time to contribute!
-
-The following is a set of guidelines for contributing to
-ai-blame. These guidelines are not strict rules.
-Use your best judgment, and feel free to propose changes to this document
-in a pull request.
+:+1: Thank you for taking the time to contribute!
 
 ## Table Of Contents
 
 * [Code of Conduct](#code-of-conduct)
-* [Guidelines for Contributions and Requests](#contributions)
-  * [Reporting issues and making requests](#reporting-issues)
-  * [Questions and Discussion](#questions-and-discussion)
-  * [Adding new elements yourself](#adding-elements)
-* [Best Practices](#best-practices)
-  * [How to write a great issue](#great-issues)
-  * [How to create a great pull/merge request](#great-pulls)
-
-<a id="code-of-conduct"></a>
+* [How to Contribute](#how-to-contribute)
+  * [Reporting Issues](#reporting-issues)
+  * [Adding New Agent Support](#adding-new-agent-support)
+  * [Pull Requests](#pull-requests)
+* [Development Setup](#development-setup)
 
 ## Code of Conduct
 
-The ai-blame team strives to create a
-welcoming environment for editors, users and other contributors.
-Please carefully read our [Code of Conduct](CODE_OF_CONDUCT.md).
+The ai-blame team strives to create a welcoming environment for all contributors.
+Please be respectful and constructive in all interactions.
+
+## How to Contribute
+
+### Reporting Issues
+
+Use the [Issue Tracker](https://github.com/ai4curation/ai-blame/issues) for:
+
+- Bug reports
+- Feature requests
+- Questions about usage
 
-<a id="contributions"></a>
+### Adding New Agent Support
 
-## Guidelines for Contributions and Requests
+We welcome PRs to add support for additional AI coding agents! Currently planned:
 
-<a id="reporting-issues"></a>
+- **OpenAI Codex** — Planned by maintainers
 
-### Reporting problems and suggesting changes to with the data model
+PRs welcome for:
 
-Please use our [Issue Tracker][issues] for any of the following:
+- Cursor
+- Aider
+- GitHub Copilot
+- Windsurf
+- Other AI coding assistants
 
-- Reporting problems
-- Requesting new schema elements
+To add support for a new agent:
 
-<a id="questions-and-discussions"></a>
+1. Study the trace format of the agent (where are traces stored? what format?)
+2. Add a new parser in `src/ai_blame/extractor.py` or create a new module
+3. Add test data in `tests/data/` with sample traces
+4. Write tests that verify extraction works correctly
+5. Update documentation
 
-### Questions and Discussions
+### Pull Requests
 
-Please use our [Discussions forum][discussions] to ask general questions or contribute to discussions.
+- PRs should be atomic and address a single issue
+- Reference issues using standard conventions (e.g., "fixes #123")
+- Ensure all tests pass: `just test`
+- Follow the existing code style (enforced by `ruff`)
 
-<a id="adding-elements"></a>
+## Development Setup
 
-### Adding new elements yourself
+```bash
+# Clone the repository
+git clone https://github.com/ai4curation/ai-blame
+cd ai-blame
 
-Please submit a [Pull Request][pulls] to submit a new term for consideration.
+# Install dependencies
+uv sync
 
-<a id="best-practices"></a>
+# Run tests
+just test
 
-## Best Practices
+# Run specific test file
+uv run pytest tests/test_cli.py -v
 
-<a id="great-issues"></a>
+# Build docs locally
+just docs
+```
 
-### GitHub Best Practice
+### Project Structure
 
-- Creating and curating issues
-    - Read ["About Issues"][[about-issues]]
-    - Issues should be focused and actionable
-    - Complex issues should be broken down into simpler issues where possible
-- Pull Requests
-    - Read ["About Pull Requests"][about-pulls]
-    - Read [GitHub Pull Requests: 10 Tips to Know](https://blog.mergify.com/github-pull-requests-10-tips-to-know/)
-    - Pull Requests (PRs) should be atomic and aim to close a single issue
-    - Long running PRs should be avoided where possible
-    - PRs should reference issues following standard conventions (e.g. “fixes #123”)
-    - Schema developers should always be working on a single issue at any one time
-    - Never work on the main branch, always work on an issue/feature branch
-    - Core developers can work on branches off origin rather than forks
-    - Always create a PR on a branch to maximize transparency of what you are doing
-    - PRs should be reviewed and merged in a timely fashion by the ai-blame technical leads
-    - PRs that do not pass GitHub actions should never be merged
-    - In the case of git conflicts, the contributor should try and resolve the conflict
-    - If a PR fails a GitHub action check, the contributor should try and resolve the issue in a timely fashion
+```
+src/ai_blame/
+├── cli.py          # Typer CLI commands
+├── config.py       # Configuration loading (.ai-blame.yaml)
+├── extractor.py    # Trace parsing and edit extraction
+├── models.py       # Pydantic data models
+└── updater.py      # File update logic (append, sidecar, comment)
 
-### Understanding LinkML
+tests/
+├── data/           # Test trace data
+├── test_cli.py     # CLI integration tests
+├── test_extractor.py
+└── test_updater.py
+```
 
-Core developers should read the material on the [LinkML site](https://linkml.io/linkml), in particular:
+### Testing with Real Traces
 
-- [Overview](https://linkml.io/linkml/intro/overview.html)
-- [Tutorial](https://linkml.io/linkml/intro/tutorial.html)
-- [Schemas](https://linkml.io/linkml/schemas/index.html)
-- [FAQ](https://linkml.io/linkml/faq/index.html)
+The test suite includes real Claude Code trace data in `tests/data/`. To test with your own traces:
 
-### Modeling Best Practice
+```bash
+ai-blame stats --dir /path/to/project --home /path/to/home
+```
 
-- Follow Naming conventions
-    - Standard LinkML naming conventions should be followed (UpperCamelCase for classes and enums, snake_case for slots)
-    - Know how to use the LinkML linter to check style and conventions
-    - The names for classes should be nouns or noun-phrases: Person, GenomeAnnotation, Address, Sample
-    - Spell out abbreviations and short forms, except where this goes against convention (e.g. do not spell out DNA)
-    - Elements that are imported from outside (e.g. schema.org) need not follow the same naming conventions
-    - Multivalued slots should be named as plurals
-- Document model elements
-    - All model elements should have documentation (descriptions) and other textual annotations (e.g. comments, notes)
-    - Textual annotations on classes, slots and enumerations should be written with minimal jargon, clear grammar and no misspellings
-- Include examples and counter-examples (intentionally invalid examples)
-    - Rationale: these serve as documentation and unit tests
-    - These will be used by the automated test suite
-    - All elements of the schema must be illustrated with valid and invalid data examples in src/data. New schema elements will not be merged into the main branch until examples are provided
-    - Invalid example data files should be invalid for one single reason, which should be reflected in the filename. It should be possible to render the invalid example files valid by addressing that single fault.
-- Use enums for categorical values
-    - Rationale: Open-ended string ranges encourage multiple values to represent the same entity, like “water”, “H2O” and “HOH”
-    - Any slot whose values could be constrained to a finite set should use an Enum
-    - Non-categorical values, e.g. descriptive fields like `name` or `description` fall outside of this.
-- Reuse
-    - Existing scheme elements should be reused where appropriate, rather than making duplicative elements
-    - More specific classes can be created by refinining classes using inheritance (`is_a`)
+### Code Style
 
-[about-branches]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-branches
-[about-issues]: https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues
-[about-pulls]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests
-[issues]: https://github.com/bbop-skills/ai-blame/issues/
-[pulls]: https://github.com/bbop-skills/ai-blame/pulls/
+- Use type hints
+- Write docstrings with doctests where appropriate
+- Follow existing patterns in the codebase
+- Run `just format` before committing
 
-We recommend also reading [GitHub Pull Requests: 10 Tips to Know](https://blog.mergify.com/github-pull-requests-10-tips-to-know/)
+[issues]: https://github.com/ai4curation/ai-blame/issues/
+[pulls]: https://github.com/ai4curation/ai-blame/pulls/
Original file line number	Diff line number	Diff line change
`@@ -1,14 +1,4 @@`
`1`	`1`	`{`
`2`	`2`	`"permissions": {`
`3`		`- "allow": [`
`4`		`- "Bash(*)",`
`5`		`- "Edit",`
`6`		`- "MultiEdit",`
`7`		`- "NotebookEdit",`
`8`		`- "FileEdit",`
`9`		`- "WebFetch",`
`10`		`- "WebSearch",`
`11`		`- "Write"`
`12`		`- ]`
`13`	`3`	`}`
`14`	`4`	`}`