| contextgit | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
This document describes the high-level architecture of contextgit, a CLI tool for managing requirements and context traceability in LLM-assisted software projects. The architecture follows a layered design with clear separation of concerns.
- Local-first: All operations happen on the local filesystem; no network dependencies.
- Text-based: All persistent state is stored in human-readable YAML and Markdown.
- Git-friendly: Deterministic output and clean diffs for version control.
- Fast feedback: Most commands complete in under 500ms.
- LLM-optimized: Designed for consumption by Claude Code and similar tools.
- Fail-safe: Never corrupt the index; use atomic operations.
┌─────────────────────────────────────────────────────────────┐
│ Developer Environment │
│ │
│ ┌─────────────┐ │
│ │ Human │ │
│ │ Developer │───────┐ │
│ └─────────────┘ │ │
│ │ │
│ ┌─────────────┐ │ ┌──────────────┐ │
│ │ Claude Code │ ├─────▶│ contextgit │ │
│ │ (or other │ │ │ CLI Tool │ │
│ │ LLM CLI) │───────┘ └──────┬───────┘ │
│ └─────────────┘ │ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ Repository │ │
│ │ ├── .contextgit/ │ │
│ │ ├── docs/ │ │
│ │ └── src/ │ │
│ └───────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Both humans and LLM CLIs invoke contextgit commands. The tool reads and writes to the local git repository, manipulating .contextgit/ state files and scanning docs/ for requirement metadata.
┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer (Typer/Click) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ init │ │ scan │ │ show │ │ extract │ ... │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼────────────┼────────────┼────────────┼─────────────────┘
│ │ │ │
│ │ │ │
┌───────▼────────────▼────────────▼────────────▼─────────────────┐
│ Application Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Command Handlers / Controllers │ │
│ │ - InitHandler │ │
│ │ - ScanHandler │ │
│ │ - StatusHandler │ │
│ │ - ShowHandler │ │
│ │ - ExtractHandler │ │
│ │ - LinkHandler │ │
│ │ - ConfirmHandler │ │
│ │ - NextIdHandler │ │
│ │ - RelevanceHandler │ │
│ │ - FormatHandler │ │
│ │ - ValidateHandler (v1.2+) │ │
│ │ - ImpactHandler (v1.2+) │ │
│ │ - HooksHandler (v1.2+) │ │
│ │ - WatchHandler (v1.2+) │ │
│ │ - MCPServerHandler (v1.2+) │ │
│ └────────────────────┬────────────────────────────────────┘ │
└─────────────────────────┼──────────────────────────────────────┘
│
│
┌─────────────────────────▼──────────────────────────────────────┐
│ Core Domain Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Index Manager │ │
│ │ - Load/save index YAML │ │
│ │ - Atomic writes (temp + rename) │ │
│ │ - Node CRUD operations │ │
│ │ - Link CRUD operations │ │
│ │ - Checksum management │ │
│ │ - Validation │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Metadata Parser │ │
│ │ - Parse YAML frontmatter │ │
│ │ - Parse inline HTML comment blocks │ │
│ │ - Extract node metadata │ │
│ │ - Normalize and validate fields │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Location Resolver & Snippet Extractor │ │
│ │ - Parse Markdown structure (headings) │ │
│ │ - Map heading paths to line ranges │ │
│ │ - Extract snippets by location │ │
│ │ - Handle line-based locations │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ File Scanner (v1.2+) │ │
│ │ - Abstract base scanner interface │ │
│ │ - MarkdownScanner (.md) │ │
│ │ - PythonScanner (.py, .pyw) │ │
│ │ - JavaScriptScanner (.js, .jsx, .ts, .tsx, .mjs) │ │
│ │ - Extensible for new file formats │ │
│ │ - Scanner registry for extension mapping │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Linking Engine │ │
│ │ - Build link graph from upstream/downstream │ │
│ │ - Detect checksum changes │ │
│ │ - Update sync status │ │
│ │ - Traverse graph (upstream/downstream queries) │ │
│ │ - Detect circular dependencies (v1.2+) │ │
│ │ - Validate links with file context (v1.2+) │ │
│ │ - Identify orphans │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Checksum Calculator │ │
│ │ - Normalize text (whitespace, line endings) │ │
│ │ - Compute SHA-256 hash │ │
│ │ - Compare checksums for drift detection │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ID Generator │ │
│ │ - Read config for prefixes │ │
│ │ - Scan existing IDs │ │
│ │ - Generate next sequential ID │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Config Manager │ │
│ │ - Load/save .contextgit/config.yaml │ │
│ │ - Provide defaults │ │
│ │ - Validate config structure │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
│
┌─────────────────────────▼──────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ File System Access │ │
│ │ - Read files (UTF-8) │ │
│ │ - Write files atomically │ │
│ │ - Walk directory trees │ │
│ │ - Detect repository root (.git, .contextgit) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ YAML Serialization │ │
│ │ - Parse YAML safely │ │
│ │ - Dump YAML deterministically │ │
│ │ - Sort keys and lists for git-friendliness │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Output Formatter │ │
│ │ - Format output for terminal (colors, tables) │ │
│ │ - Format output as JSON │ │
│ │ - Handle --format flag │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MCP Server (v1.2+) │ │
│ │ - Model Context Protocol implementation │ │
│ │ - 5 Tools: relevant_for_file, extract, status, │ │
│ │ impact_analysis, search │ │
│ │ - 2 Resources: index, llm-instructions │ │
│ │ - stdio transport for Claude Code │ │
│ │ - Pydantic response schemas │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ File System Watcher (v1.2+) │ │
│ │ - Watch directories for changes (watchdog) │ │
│ │ - Debounce rapid file changes │ │
│ │ - Trigger scan on file modifications │ │
│ │ - Graceful shutdown handling │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Git Hooks Manager (v1.2+) │ │
│ │ - Install/uninstall git hooks │ │
│ │ - Pre-commit: scan changed files │ │
│ │ - Post-merge: full project scan │ │
│ │ - Pre-push: staleness check (optional) │ │
│ │ - Preserve existing custom hooks │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Purpose: Parse command-line arguments and route to appropriate handlers.
Technology: Typer (recommended) or Click for argument parsing and help text.
Responsibilities:
- Define all CLI commands and their arguments
- Parse flags (--format, --recursive, --dry-run, etc.)
- Provide --help documentation
- Handle top-level exceptions and format error messages
- Set exit codes
Key Commands:
contextgit initcontextgit scan [PATH] [--recursive] [--dry-run] [--format json] [--files]contextgit status [--orphans] [--stale] [--file PATH] [--type TYPE] [--format json]contextgit show <ID> [--format json] [--graph]contextgit extract <ID> [--format json]contextgit link <FROM> <TO> --type <RELATION>contextgit confirm <ID>contextgit next-id <TYPE> [--format json]contextgit relevant-for-file <PATH> [--depth N] [--format json]contextgit fmtcontextgit validate [PATH] [--strict] [--format json](v1.2+)contextgit impact <ID> [--format tree|json|checklist](v1.2+)contextgit hooks install|uninstall|status(v1.2+)contextgit watch [PATH] [--debounce MS](v1.2+)contextgit mcp-server(v1.2+)
Purpose: Implement business logic for each CLI command.
Responsibilities:
- Coordinate calls to core domain components
- Handle command-specific validation
- Format output (plain text or JSON)
- Manage transactions (read index → modify → write index)
- Report progress for long operations
Example: ScanHandler
- Detect repository root
- Load config to get directory patterns and prefixes
- Walk directory tree to find Markdown files
- For each file:
- Parse metadata blocks using MetadataParser
- Extract location using LocationResolver
- Calculate checksum using ChecksumCalculator
- Load index using IndexManager
- Update/create nodes
- Create/update links based on upstream/downstream
- Update sync status using LinkingEngine
- Save index atomically
- Output summary
Purpose: Manage the central index file (.contextgit/requirements_index.yaml).
Responsibilities:
- Load index from disk (validate YAML structure)
- Provide CRUD operations for nodes and links
- Save index to disk atomically (write temp file, rename)
- Sort nodes and links deterministically
- Validate node and link structure
Key Operations:
load_index() -> Indexsave_index(index: Index) -> Noneget_node(id: str) -> Node | Noneadd_node(node: Node) -> Noneupdate_node(id: str, updates: dict) -> Nonedelete_node(id: str) -> Noneget_link(from_id: str, to_id: str) -> Link | Noneadd_link(link: Link) -> Noneupdate_link(from_id: str, to_id: str, updates: dict) -> Nonedelete_link(from_id: str, to_id: str) -> None
Data Structures:
@dataclass
class Node:
id: str
type: str # business | system | architecture | code | test | decision | other
title: str
file: str # relative path
location: Location
status: str # draft | active | deprecated | superseded
last_updated: str # ISO 8601
checksum: str
llm_generated: bool = False
tags: list[str] = field(default_factory=list)
@dataclass
class Link:
from_id: str
to_id: str
relation_type: str # refines | implements | tests | derived_from | depends_on
sync_status: str # ok | upstream_changed | downstream_changed | broken
last_checked: str # ISO 8601
@dataclass
class Index:
nodes: dict[str, Node] # keyed by id
links: list[Link]Purpose: Extract contextgit metadata from Markdown files.
Responsibilities:
- Parse YAML frontmatter at the beginning of files
- Parse inline HTML comment blocks (
<!-- contextgit ... -->) - Normalize and validate metadata fields
- Handle
id: autoplaceholder - Report errors for malformed metadata
Key Operations:
parse_file(file_path: str) -> list[RawMetadata]parse_frontmatter(content: str) -> RawMetadata | Noneparse_inline_blocks(content: str) -> list[RawMetadata]validate_metadata(raw: RawMetadata) -> Metadata
Algorithm for Inline Blocks:
- Use regex to find
<!-- contextgit\n...\n-->patterns - Extract YAML content between delimiters
- Parse YAML
- Record line number of the comment block
- Return list of metadata objects with locations
Algorithm for Frontmatter:
- Check if file starts with
--- - Extract content until closing
--- - Parse YAML
- Check for
contextgitkey at top level - Return metadata if found
Purpose: Map metadata blocks to precise locations in files and extract text snippets.
Responsibilities:
- Parse Markdown structure (identify headings and their hierarchy)
- Map metadata blocks to heading paths
- Extract snippets by heading path or line range
- Handle multi-level headings (
#,##,###, etc.)
Key Operations:
resolve_location(file_path: str, metadata_line: int) -> Locationextract_snippet(file_path: str, location: Location) -> strparse_markdown_structure(content: str) -> list[Heading]
Location Types:
@dataclass
class HeadingLocation:
kind: str = "heading"
path: list[str] # e.g., ["Requirements", "Logging", "API Endpoint"]
@dataclass
class LineLocation:
kind: str = "lines"
start: int
end: int
Location = HeadingLocation | LineLocationAlgorithm for Heading Path:
- Parse file to identify all headings and their levels
- Find the heading immediately after the metadata block
- Build the heading path from root to that heading
- Return
HeadingLocationwith path
Algorithm for Snippet Extraction (Heading):
- Parse file to identify all headings
- Find the heading matching the path
- Extract from that heading line through all content until:
- Next heading of same or higher level
- End of file
- Return extracted text
Algorithm for Snippet Extraction (Lines):
- Read file
- Extract lines from
starttoend(inclusive) - Return extracted text
Purpose: Build and maintain the traceability graph; detect staleness and orphans.
Responsibilities:
- Build link graph from node upstream/downstream declarations
- Compare checksums to detect changes
- Update link sync status based on checksum changes
- Traverse graph to find upstream/downstream nodes
- Detect circular dependencies
- Identify orphan nodes
Key Operations:
build_links_from_metadata(nodes: dict[str, Node]) -> list[Link]update_sync_status(index: Index, changed_nodes: set[str]) -> Noneget_upstream_nodes(node_id: str, depth: int = 1) -> list[Node]get_downstream_nodes(node_id: str, depth: int = 1) -> list[Node]detect_orphans(index: Index) -> list[str]detect_circular_dependencies(index: Index) -> list[list[str]]
Algorithm for Sync Status Update:
- For each node whose checksum changed:
- Find all links where this node is
from_id(outgoing)- Mark those links as
sync_status: downstream_changed
- Mark those links as
- Find all links where this node is
to_id(incoming)- Mark those links as
sync_status: upstream_changed
- Mark those links as
- Find all links where this node is
- User must manually review and run
contextgit confirmto mark asok
Algorithm for Orphan Detection:
- Identify top-level node types (business requirements) that don't need upstream
- For all other nodes, flag if no incoming links (no upstream)
- Identify leaf-level node types (code, test) that don't need downstream
- For all other nodes, flag if no outgoing links (no downstream)
Purpose: Compute and compare content checksums for change detection.
Responsibilities:
- Normalize text (strip leading/trailing whitespace, normalize line endings)
- Compute SHA-256 hash of normalized text
- Compare checksums to detect changes
Key Operations:
calculate_checksum(text: str) -> strcompare_checksums(old: str, new: str) -> bool
Algorithm:
- Normalize text:
- Convert all line endings to
\n - Strip leading/trailing whitespace from each line
- Remove completely empty lines at start/end
- Convert all line endings to
- Encode to UTF-8 bytes
- Compute SHA-256 hash
- Return hex digest
Purpose: Generate unique sequential IDs for new requirements.
Responsibilities:
- Read config to get prefix for each node type
- Scan existing node IDs matching that prefix
- Return next sequential ID with zero-padding
Key Operations:
next_id(node_type: str, index: Index, config: Config) -> str
Algorithm:
- Load config to get prefix (e.g., "SR-" for system requirements)
- Get all node IDs from index
- Filter to those starting with the prefix
- Extract numeric portion and parse as integers
- Find max number
- Return
prefix + str(max + 1).zfill(3)
Purpose: Load and manage .contextgit/config.yaml.
Responsibilities:
- Load config file
- Provide defaults if file doesn't exist or fields are missing
- Validate config structure
Key Operations:
load_config() -> Configsave_config(config: Config) -> Noneget_default_config() -> Config
Default Config:
tag_prefixes:
business: "BR-"
system: "SR-"
architecture: "AR-"
code: "C-"
test: "T-"
decision: "ADR-"
directories:
business: "docs/01_business"
system: "docs/02_system"
architecture: "docs/03_architecture"
code: "src"
test: "tests"Purpose: Provide extensible file scanning with format-specific metadata extraction.
Responsibilities:
- Define abstract scanner interface
- Provide concrete scanners for each file format
- Registry-based scanner selection by file extension
- Extract metadata blocks from various formats
Key Operations:
get_scanner_for_file(file_path: str) -> FileScanner | Nonescan_file(file_path: str) -> list[RawMetadata]get_supported_extensions() -> set[str]
Scanner Types:
class FileScanner(ABC):
"""Abstract base for file scanners."""
@abstractmethod
def supported_extensions(self) -> set[str]: ...
@abstractmethod
def scan(self, file_path: Path) -> list[RawMetadata]: ...
class MarkdownScanner(FileScanner):
"""Scans .md files for YAML frontmatter and inline comments."""
def supported_extensions(self) -> set[str]:
return {".md", ".markdown"}
class PythonScanner(FileScanner):
"""Scans Python files for docstring and comment metadata."""
def supported_extensions(self) -> set[str]:
return {".py", ".pyw"}
class JavaScriptScanner(FileScanner):
"""Scans JS/TS files for JSDoc-style metadata."""
def supported_extensions(self) -> set[str]:
return {".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs"}Metadata Formats by File Type:
| File Type | Format | Example |
|---|---|---|
| Markdown | YAML frontmatter | ---\ncontextgit:\n id: SR-001\n--- |
| Markdown | Inline HTML comment | <!-- contextgit\nid: SR-001\n--> |
| Python | Docstring YAML | """contextgit:\n id: C-001\n""" |
| Python | Comment block | # contextgit:\n# id: C-001 |
| JavaScript | JSDoc block | /** @contextgit\n * id: C-001\n */ |
Purpose: Abstract file I/O operations.
Responsibilities:
- Read files as UTF-8 text
- Write files atomically (write to temp, rename)
- Walk directory trees
- Detect repository root by looking for
.git/or.contextgit/
Key Operations:
read_file(path: str) -> strwrite_file_atomic(path: str, content: str) -> Nonewalk_files(root: str, pattern: str = "*.md") -> Iterator[str]find_repo_root(start_path: str) -> str
Purpose: Parse and serialize YAML with deterministic formatting.
Responsibilities:
- Parse YAML safely (using
yaml.safe_load) - Dump YAML with sorted keys and deterministic formatting
- Use 2-space indentation
- Use block style for readability
Key Operations:
load_yaml(content: str) -> dictdump_yaml(data: dict) -> str
Configuration for ruamel.yaml:
yaml = YAML()
yaml.default_flow_style = False
yaml.indent(mapping=2, sequence=2, offset=0)
yaml.width = 120Purpose: Format command output for terminal or JSON.
Responsibilities:
- Format plain-text output with colors and tables (using
richor similar) - Format JSON output with proper structure
- Respect
--format jsonflag
Key Operations:
format_status(index: Index, format: str) -> strformat_node(node: Node, format: str) -> strformat_links(links: list[Link], format: str) -> str
Purpose: Provide native LLM integration via the Model Context Protocol.
Responsibilities:
- Implement MCP server with stdio transport
- Expose contextgit operations as MCP tools
- Provide index and instructions as MCP resources
- Use Pydantic schemas for response validation
Key Operations:
start_server() -> None(runs stdio event loop)register_tools() -> Noneregister_resources() -> None
MCP Tools:
| Tool | Description |
|---|---|
relevant_for_file |
Find requirements related to a source file |
extract |
Extract requirement snippet by ID |
status |
Get project health summary |
impact_analysis |
Analyze downstream impact of a node |
search |
Search nodes by query string |
MCP Resources:
| Resource | URI | Description |
|---|---|---|
| Index | contextgit://index |
Full requirements index |
| Instructions | contextgit://llm-instructions |
Usage guidelines for LLMs |
Technology: Requires mcp and pydantic packages (optional dependencies).
Purpose: Monitor file system for changes and trigger automatic scanning.
Responsibilities:
- Watch directories for file modifications
- Debounce rapid changes (default: 500ms)
- Trigger scan on relevant file changes
- Handle graceful shutdown (SIGINT, SIGTERM)
Key Operations:
start_watch(paths: list[str], debounce_ms: int) -> Nonestop_watch() -> Noneon_file_changed(event: FileSystemEvent) -> None
Algorithm:
- Initialize watchdog observer for specified paths
- Filter events to supported file extensions
- Accumulate changes in debounce window
- After debounce period expires, trigger scan with
--filesoption - Report results to console
- On SIGINT/SIGTERM, stop observer and exit cleanly
Technology: Requires watchdog package (optional dependency).
Purpose: Automate contextgit operations via git hooks.
Responsibilities:
- Install git hooks while preserving existing hooks
- Provide pre-commit, post-merge, and pre-push hooks
- Generate hook scripts that run contextgit commands
- Support idempotent installation/uninstallation
Key Operations:
install_hooks(hooks: list[str]) -> Noneuninstall_hooks(hooks: list[str]) -> Noneget_hooks_status() -> dict[str, HookStatus]
Hook Types:
| Hook | Trigger | Action |
|---|---|---|
| pre-commit | Before commit | Scan changed files, block if stale |
| post-merge | After merge/pull | Full scan of project |
| pre-push | Before push | Optional staleness check |
Hook Preservation:
- Existing hooks are backed up to
<hook>.backup - Installed hooks include markers:
# contextgit-hook-start/# contextgit-hook-end - Uninstall removes only contextgit portions, restores original if present
1. User runs: contextgit scan docs/ --recursive
2. CLI layer parses arguments
└─> ScanHandler invoked
3. ScanHandler:
├─> ConfigManager: load config
├─> FileSystem: walk docs/ to find *.md files
├─> For each file:
│ ├─> FileSystem: read file content
│ ├─> MetadataParser: parse metadata blocks
│ ├─> LocationResolver: resolve location for each block
│ ├─> ChecksumCalculator: calculate checksum of snippet
│ └─> Store parsed nodes
│
├─> IndexManager: load existing index
│
├─> For each parsed node:
│ ├─> Compare checksums with existing node
│ ├─> If different: mark as changed
│ ├─> IndexManager: update or add node
│ └─> LinkingEngine: update sync status
│
├─> IndexManager: save index atomically
└─> OutputFormatter: display summary
4. Output: "Scanned 10 files, added 3 nodes, updated 2 nodes, created 5 links"
1. User (or LLM) runs: contextgit extract SR-010 --format json
2. CLI layer parses arguments
└─> ExtractHandler invoked
3. ExtractHandler:
├─> IndexManager: load index
├─> IndexManager: get_node("SR-010")
├─> Check node exists
├─> LocationResolver: extract_snippet(node.file, node.location)
└─> OutputFormatter: format as JSON
4. Output: {"id": "SR-010", "file": "docs/...", "snippet": "..."}
LLM tools (Claude Code) detect contextgit-enabled projects by checking for .contextgit/config.yaml at the repository root.
- LLM receives task: "Implement the logging API (SR-010)"
- LLM extracts requirement:
- Runs
contextgit extract SR-010 --format json - Receives snippet with requirement details
- Runs
- LLM finds related items:
- Runs
contextgit show SR-010 --format json - Sees upstream (BR-001) and downstream (AR-020, C-120)
- Optionally extracts those as well for context
- Runs
- LLM implements code:
- Writes implementation to
src/logging/api.py
- Writes implementation to
- LLM updates traceability:
- Creates metadata block for code item C-121
- Adds
upstream: [SR-010]to metadata
- LLM scans:
- Runs
contextgit scan src/ - Index updated with new code node C-121
- Link created: SR-010 → C-121
- Runs
- LLM modifies
docs/01_business/observability.md(contains BR-001) - LLM scans:
- Runs
contextgit scan docs/01_business/ - Checksum of BR-001 changes
- All downstream links marked
upstream_changed
- Runs
- LLM notifies user:
- Runs
contextgit status --stale --format json - Reports: "Downstream items need review: SR-010, SR-011"
- Runs
- User reviews and updates SR-010, SR-011
- User confirms sync:
- Runs
contextgit confirm SR-010 - Runs
contextgit confirm SR-011 - Links marked
sync_status: ok
- Runs
- Python 3.11+: Modern Python with type hints, dataclasses, pattern matching
- Typer: Modern, type-hint-based CLI framework with great help text and validation
- Alternative: Click (more mature, slightly more verbose)
- ruamel.yaml: Preserves formatting, supports deterministic output, round-trip editing
- Alternative: PyYAML (simpler but less control over formatting)
- Python-Markdown or markdown-it-py: For parsing Markdown structure (headings)
- Alternative: Simple regex (lightweight but less robust)
- pathlib (stdlib): Modern path handling
- os and shutil (stdlib): File operations
- hashlib (stdlib): SHA-256 for checksums
- rich: Beautiful terminal output with colors, tables, progress bars
- Alternative: Plain text with ANSI codes
- pytest: Test framework
- pytest-cov: Coverage reporting
- mypy: Static type checker for type hints
- ruff: Fast linter and formatter (replaces flake8, black, isort)
- Alternative: black + flake8 + isort
These packages are optional and enable specific features:
| Package | Feature | Install Command |
|---|---|---|
| watchdog | Watch mode (contextgit watch) |
pip install contextgit[watch] |
| mcp | MCP server (contextgit mcp-server) |
pip install contextgit[mcp] |
| pydantic | MCP response schemas | pip install contextgit[mcp] |
Install all optional dependencies:
pip install contextgit[all]- Malformed metadata blocks: Log warning, skip block, continue processing
- Missing files: Mark links as broken, continue processing
- Invalid YAML in index: Refuse to start, report specific error
- Always write to temp file first, then rename (POSIX atomic)
- If any error during write, temp file is discarded, original index unchanged
- Include file path and line number for parsing errors
- Include node IDs for link errors
- Suggest corrective actions
- 0: Success
- 1: General error
- 2: Invalid arguments
- 3: File not found
- 4: Invalid metadata
- 5: Index corrupted
- 6: Stale links exist (v1.2+)
- 7: Validation errors (v1.2+)
- 8: Self-referential error (v1.2+)
- 9: Circular dependency error (v1.2+)
- 10: Missing dependency (v1.2+)
- Load index once per command, cache in memory
- Use dict for O(1) node lookup by ID
- Build link adjacency maps for O(1) upstream/downstream queries
- Process files in parallel for large directories (using
concurrent.futures) - Stream file reading (don't load entire files into memory at once)
- Short-circuit checksum calculation if file timestamp unchanged (future optimization)
- Use SHA-256 (fast, standard, collision-resistant)
- Cache checksums in index (only recalculate on content change)
- For JSON output, use
json.dumpswithensure_ascii=Falsefor speed - For large status outputs, paginate or limit results by default
- No network access
- No data transmission
- Validate all user input (IDs, paths, flags)
- Sanitize file paths (prevent path traversal)
- Use safe YAML parsing (no arbitrary code execution)
- Only read/write within repository root
- Validate that file paths are under repo root (prevent escapes)
The architecture is designed to be extensible. Here's what's been implemented and what remains:
| Format | Status | Version |
|---|---|---|
| Markdown (.md) | ✅ Implemented | v1.0 |
| Python (.py) | ✅ Implemented | v1.2 |
| JavaScript/TypeScript | ✅ Implemented | v1.2 |
| ReStructuredText | ⏳ Planned | Phase 3+ |
| AsciiDoc | ⏳ Planned | Phase 3+ |
Adding New Formats: Create a new scanner class extending FileScanner and register it with the scanner registry.
- ⏳ Support weighted links (importance, confidence)
- ⏳ Support link attributes (date range, conditional)
| Algorithm | Status | Version |
|---|---|---|
| Impact analysis | ✅ Implemented | v1.2 |
| Circular dependency detection | ✅ Implemented | v1.2 |
| Shortest path | ⏳ Planned | Phase 3+ |
| Coverage analysis | ⏳ Planned | Phase 3+ |
| Integration | Status | Version |
|---|---|---|
| Git hooks | ✅ Implemented | v1.2 |
| MCP Server | ✅ Implemented | v1.2 |
| Watch mode | ✅ Implemented | v1.2 |
| VS Code extension | ⏳ Planned | Phase 2 |
| GitHub Action | ⏳ Planned | Phase 3+ |
| CI/CD plugins | ⏳ Planned | Phase 3+ |
The architecture of contextgit is layered, modular, and designed for:
- Simplicity: Text-based storage, no database
- Speed: Most operations under 500ms
- Reliability: Atomic writes, clear error handling
- LLM-friendliness: JSON output, precise context extraction, MCP server
- Git-friendliness: Deterministic output, clean diffs, git hooks integration
- Extensibility: Pluggable scanner architecture for new file formats
This design supports the MVP requirements while providing a foundation for future enhancements. The v1.2 release significantly expands capabilities with multi-format scanning, automation (watch mode, git hooks), and native LLM integration via MCP.