Local-first documentation drift detection and fixing tool
DocSentinel detects when documentation no longer matches code, explains why, and optionally proposes fixes using a locally-run or user-supplied LLM. Currently a bit stupid but working on making it more accurate and more perfect without an LLM model.
- Purpose: Detect semantic drift between code and documentation using AST-based extraction and vector similarity
- Key Features: Git-native workflow, multi-language support (Rust/Python), local-first operation, LLM-assisted analysis, TUI interface
- Status: Production-ready v0.1.0 | All CLI commands tested and functional | See Competitive Analysis for positioning
-
Phase 1: TUI module removal
- Removed
src/tui/directory (app.rs, mod.rs, ui.rs, widgets.rs) - Updated
src/lib.rs(removedpub mod tui) - Updated
src/cli/mod.rs(removedTui(TuiArgs)from Commands enum) - Updated
src/main.rs(removed TUI command handler) - Removed TUI dependencies from
Cargo.toml(ratatui, crossterm) - ✅ Code compiles and builds cleanly
- Removed
-
Phase 2: Variable naming refactoring
- Renamed
similarities→similarity_scores(src/drift/detector.rs) - Renamed
i→doc_index,code_index(src/drift/detector.rs) - Renamed
added,removed→added_params,removed_params(src/drift/rules.rs) - Renamed
a,b,c,d→vec1,vec2,vec3,vec4(src/drift/mod.rs tests) - Renamed
byte→hash_byte(src/drift/embedding.rs) - Renamed
arr→byte_array(src/storage/mod.rs) - Removed unused
_idxvariables (src/repo/mod.rs, repo/config.rs) - Used
next_back()instead of.last()(src/cli/commands.rs) - ✅ Code compiles cleanly
- Renamed
- Phase 3: Documentation generation overhaul
- ✅ CLI arguments updated (
--human,--ai,--human-path,--ai-path,--architecture,--examples) - ✅
src/cli/mod.rsGenerateArgs struct updated - ❌ Issue: Could not add
GenerateConfigstruct tosrc/cli/commands.rsdue to file edit conflicts - ❌ Issue: Could not update
generate()function signature insrc/cli/commands.rsto use newGenerateConfig - ❌ Issue: Could not add helper functions (
generate_human_docs,generate_human_docs_with_llm,generate_ai_docs) - ❌ Issue: Code compiles but using old
generate()function - Current state: CLI args pass to main.rs, which calls old generate() with 6 positional params instead of GenerateConfig
- ✅ CLI arguments updated (
- Phase 4: Code quality cleanup (remove dead code, fix unused warnings)
- Phase 5: Update Cargo.toml (remove TUI deps - already done in Phase 1)
- Phase 6: Testing & validation (add tests for new features)
- Phase 7: Documentation updates (README.md, CHANGELOG.md)
- Agent.md update: Add detailed plans for remaining phases
Problem: Phase 3 has file edit conflicts preventing clean application of changes.
Solution Steps:
- Restore
src/cli/commands.rsto clean state - Add
GenerateConfigstruct at top of file - Update
generate()function signature to acceptGenerateConfig - Add helper functions at end of file:
generate_human_docs()- Generates human-readable OnboardDocs.mdgenerate_human_docs_with_llm()- LLM-enhanced versiongenerate_ai_docs()- Generates machine-readable OnboardAIdocs.md
- Update
src/main.rsto createGenerateConfigand pass togenerate() - Test compilation:
cargo check - Test functionality:
docsentinel generate --helpshould show new flags
Expected Outcomes:
- Two separate documentation files:
OnboardDocs.md(human) andOnboardAIdocs.md(AI) - Human docs include architecture diagrams, examples, module overviews
- AI docs include structured type definitions, function references, cross-index
- Default behavior generates both if neither
--humannor--aispecified
Notes:
- Current CLI still uses old
generate()signature - needs careful manual update - Consider starting fresh with restored file to avoid accumulated conflicts
In real codebases, documentation does not fail loudly. It rots quietly. APIs change, function behavior shifts, flags are added, defaults change, and the docs continue to assert something that is no longer true. This causes onboarding friction, bugs, and operational mistakes.
The real problem is not writing documentation. It is detecting when documentation is wrong.
DocSentinel answers one question reliably:
Which parts of my documentation are now inconsistent with the code, and why?
- Local-first by default - Runs entirely on your machine with no network dependency unless explicitly enabled
- Explainability over automation - Every detected issue shows evidence. Silent fixes are forbidden
- Narrow scope - Does not manage documentation. Detects drift and proposes changes
- Open core - Free version is fully usable. Paid features provide automation, hosting, and convenience
git clone https://github.com/docsentinel/docsentinel
cd docsentinel
cargo build --releaseThe binary will be at target/release/docsentinel.
cargo install docsentinel# Initialize DocSentinel in your repository
docsentinel init
# Scan for documentation drift
docsentinel scan
# View detected issues
docsentinel status
# Launch interactive TUI
docsentinel tuiInitialize DocSentinel in a repository.
docsentinel init [--force] [--no-scan]Creates a .docsentinel directory with:
- SQLite database for storing chunks and drift events
- Configuration file (
config.toml)
Scan the repository for documentation drift.
docsentinel scan [--full] [--range <RANGE>] [--uncommitted] [--with-llm]Options:
--full: Scan all files, not just changed ones--range: Commit range to scan (e.g., "HEAD~5..HEAD")--uncommitted: Include uncommitted changes--with-llm: Use LLM for analysis
Show detected drift issues.
docsentinel status [--all] [--severity <LEVEL>] [--detailed]Launch the interactive terminal user interface.
docsentinel tuiThe TUI provides:
- Dashboard with repository statistics (chunks, events, confidence scores)
- Issue list with navigation and filtering
- Detailed issue view with evidence display
- Fix editor with side-by-side diff preview
- Keyboard-driven workflow (see Keyboard Shortcuts)
Note: TUI requires terminal with cursor support and 256-color support. Windows Terminal may have limitations.
Apply a suggested fix to a drift issue.
docsentinel fix <ISSUE_ID> [--yes] [--content <TEXT>] [--commit]Ignore a drift issue.
docsentinel ignore <ISSUE_ID> [--reason <TEXT>] [--permanent]Install or manage git hooks.
docsentinel hooks [--install] [--uninstall] [--status]Watch for changes and scan automatically.
docsentinel watch [--debounce <MS>] [--background]Show or modify configuration.
docsentinel config [--show] [--set <KEY=VALUE>] [--get <KEY>] [--reset]Analyze a specific file or symbol.
docsentinel analyze <TARGET> [--docs] [--similarity]When --docs is provided, performs embedding-based search to find related documentation sections:
- Shows top 5 most similar doc chunks by cosine similarity
- Displays file paths and content previews
- Requires embeddings to be generated (use
--with-llmor configure LLM)
Generate documentation from code chunks.
docsentinel generate --readme # Generate README.md
docsentinel generate --docs # Generate full documentation
docsentinel generate --include-private # Include private symbols
docsentinel generate --with-llm # Use LLM for descriptionsPerformance Notes:
- Initialization: ~1s for small repos, ~10s for large repos (first scan)
- Incremental scan: <1s for small changes
- LLM analysis: ~2-5s per drift event (depends on model speed)
- Database: SQLite (sufficient for repos up to ~50K chunks)
Configuration is stored in .docsentinel/config.toml:
# Patterns for documentation files
doc_patterns = ["*.md", "*.mdx", "*.rst", "docs/**/*"]
# Patterns for code files
code_patterns = ["*.rs", "*.py", "src/**/*.rs"]
# Patterns to ignore
ignore_patterns = ["target/**", "node_modules/**"]
# Languages to analyze
languages = ["rust", "python"]
# Similarity threshold for drift detection (0.0 - 1.0)
similarity_threshold = 0.7
# Number of nearest doc chunks to consider
top_k = 5
# LLM configuration
[llm]
endpoint = "http://localhost:11434"
model = "llama2"
max_tokens = 2048
temperature = 0.3DocSentinel operates on Git repositories. On each scan:
- Identifies commit range since last scan (stored in SQLite)
- Extracts changed files using
git2library - Categorizes changes into code and documentation via glob patterns
Uses tree-sitter to parse AST (Abstract Syntax Tree) and extract semantically meaningful units:
- Public function definitions
- Method signatures and their parameters
- Structs / classes / traits
- Doc comments (Rustdoc / Python docstrings)
- Signature extraction for drift comparison
Supported languages (v1):
- Rust (via tree-sitter-rust)
- Python (via tree-sitter-python)
- (Extensible architecture for more languages)
Parses Markdown files using pulldown-cmark by heading hierarchy. Each section becomes a "Doc Chunk" with:
- File path and line range
- Heading path (e.g.,
["API", "Functions", "user_create"]) - Section level (H1-H6)
- Raw content and SHA-256 hash
- Optional embedding vectors (384-dim for similarity search)
When LLM is configured, DocSentinel generates embeddings:
- Code chunks: Symbol name + signature + content
- Doc chunks: Heading path + section content
- Stored as binary blobs in SQLite (f32 arrays)
- Enables semantic similarity search via cosine distance
Embedding providers:
- Ollama (local, default:
http://localhost:11434) - OpenAI-compatible endpoints (customizable)
- Mock embeddings (for testing without LLM)
Drift is detected through a hybrid approach:
Hard Rules (Rule-based):
- Public API signature changed → Check signature hash mismatch
- Function removed → Code chunk exists now, doc chunk deleted
- New function added → Code chunk exists, no related doc found
- Parameter count changed → Signature comparison
Soft Rules (Semantic similarity):
- Compute cosine similarity between code embedding and doc embeddings
- Similarity threshold: 0.7 (configurable)
- Top-K nearest docs: 5 (configurable)
- Significant drop detection (≥10% similarity decrease)
Drift Event Structure:
{
"id": "uuid",
"severity": "High|Medium|Low|Critical",
"description": "Human-readable summary",
"evidence": "Technical details",
"confidence": 0.0-1.0,
"related_code_chunks": ["id1", "id2"],
"related_doc_chunks": ["id1"],
"suggested_fix": "LLM-generated (optional)",
"status": "Pending|Accepted|Ignored|Fixed"
}When drift is detected and LLM is configured:
- Trigger: Only after rule-based detection, not for every scan
- Context provided: Old code, new code, related docs, drift evidence
- Prompt engineering: Optimized for drift explanation + fix generation
- Response format: JSON with summary, reason, suggested_fix, confidence
Supported providers:
- Ollama (local, default:
llama2) - OpenAI-compatible (Anthropic, Together, local APIs)
- Custom endpoint support with API key authentication
Use cases:
docsentinel scan --with-llm: Run drift analysis with LLMdocsentinel fix <id>: Use LLM to generate fix suggestionsdocsentinel generate --with-llm: Generate natural language docs from code
Ctrl+C,Ctrl+Q- Quit?- Show help
i,Enter- View issuess- Run scanq- Quit
↑/k,↓/j- NavigateEnter- View detailsf- Open fix editorx- Ignore issueEsc- Back to dashboard
e- Edit fixa- Apply fixEsc- Cancel
┌─────────────────────────────────────────────────────────────┐
│ DocSentinel │
├─────────────────────────────────────────────────────────────┤
│ CLI (clap) TUI (ratatui) │
├─────────────────────────────────────────────────────────────┤
│ Drift Detection Engine │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Hard Rules │ │ Soft Rules │ │ Semantic Similarity │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ Code Extraction │ │ Documentation Extraction │ │
│ │ (tree-sitter) │ │ (pulldown-cmark) │ │
│ └─────────────────┘ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ Git Integration │ │ SQLite Storage │ │
│ │ (git2) │ │ (rusqlite) │ │
│ └─────────────────┘ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────────┐ │
│ │ LLM Integration (Ollama / OpenAI-compatible) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Language Support: Only Rust and Python (JavaScript/TypeScript, Go, Java planned)
- LLM Required: Advanced drift explanation requires Ollama or compatible LLM (basic rules work without it)
- TUI: Terminal UI requires terminal with cursor support (not tested in Windows Terminal)
- Large Repositories: Performance untested on >10K files (potential optimization needed)
- Binary Compatibility: Release binary tested on Linux, macOS/Windows support expected
- Drift Detection: Currently signature-based (behavioral drift via embeddings requires LLM)
See Roadmap for upcoming features addressing these limitations.
cargo buildcargo testcargo run -- init
cargo run -- scan
cargo run -- tuiDocSentinel occupies a unique niche as a local-first, AST-based documentation drift detection tool. Unlike most documentation tools that focus on validation or linting, DocSentinel detects semantic inconsistency between code and documentation over time.
| Tool | Approach | Core Strength | Limitations | Local-First |
|---|---|---|---|---|
| DocSentinel | AST extraction + semantic embeddings + drift rules | Multi-language (Rust/Python), Git-native, TUI, offline-capable | ✅ Yes | |
| GenLint | Change watching + consistency checks | Cloud integration (GitHub/Jira/Confluence), automated scanning | ❌ No (SaaS) | |
| Optic | OpenAPI spec diffing | Breaking change prevention, accurate API docs | OpenAPI only, not general code | ❌ No |
| Spectral | OpenAPI linter with custom rules | Highly configurable, quality enforcement | OpenAPI only | ❌ No |
| docsig | Signature validation | Simple, focused approach | Rust only, semantic-only | ✅ Yes |
| checkdoc | Markdown quality linting | Format enforcement, basic checks | No code awareness | ✅ Yes |
| diffsitter | AST-based semantic diffs | Tree-sitter powered, ignores formatting | Diff tool only, no drift tracking | ✅ Yes |
| resemble | AST + cosine similarity (Rust) | Structural code comparison | Rust only, library not full tool | ✅ Yes |
| tree-sitter-mcp | Code structure for AI | Fast search, 15+ languages | Analysis only, no drift detection | ✅ Yes |
- Git-Native Workflow: Operates on commit ranges, not just file snapshots
- Semantic Understanding: Uses tree-sitter AST extraction, not regex patterns
- Embedding-Powered Search: Finds related docs via vector similarity (not just keyword matching)
- Explainability Over Automation: Every drift event shows evidence, no silent fixes
- Local-First: Full functionality without network/Cloud dependencies (LLM optional)
- Language Coverage: Supports Rust and Python (v1), with extensible architecture
| Feature | DocSentinel | GenLint | Action |
|---|---|---|---|
| CI/CD Integration | ❌ Missing | ✅ GitHub Actions | Add workflow examples |
| Pre-commit Hooks | ✅ Auto-install | Document hooks integration | |
| Web Dashboard | ❌ CLI only | ✅ Available | Could add in future phase |
| Multi-repo Support | ❌ Single repo | ❌ Single repo | Design choice, not gap |
| Slack/Discord Notifications | ❌ Missing | ✅ Available | Could add webhook support |
- Phase 1: Core scanning and drift detection
- Phase 2: LLM explanation and fix proposal
- Phase 3: TUI refinement
-
Phase 4: Ecosystem Integration
- GitHub Actions workflow for drift checking
- Pre-commit hook auto-installation
- Webhook notifications for drift events
- VS Code extension for inline warnings
-
Phase 5: Enhanced Detection
- Additional language support (JavaScript/TypeScript, Go, Java)
- Configurable hard rules (custom drift patterns)
- Diff visualization in TUI
- Historical drift trends and analytics
-
Phase 6: Collaboration Features
- Team drift dashboards (self-hosted)
- Pull request integration with drift summaries
- Drift review approval workflows
-
Phase 7: Enterprise (Open Core + Paid)
- Self-hosted cloud version for teams
- Advanced role-based permissions
- Audit logs and compliance reporting
- Priority support and SLAs
MIT OR Apache-2.0
We welcome contributions! DocSentinel is designed with modularity in mind, making it easy to extend with new languages, drift rules, and embedding providers.
Language Support:
- Add new tree-sitter parsers in
src/extract/code.rs - Implement language-specific signature extraction logic
- Add tests for new language parsing
Drift Rules:
- Add custom hard rules in
src/drift/rules.rs - Implement new soft rule patterns
- Improve rule confidence scoring
Integration:
- Add pre-commit hook installation scripts
- Implement GitHub Actions workflow examples
- Add CI/CD pipeline detection examples
Documentation:
- Update this README when adding new commands
- Add usage examples for new features
- Test
--helpoutput for clarity
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following existing code style
- Run tests:
cargo test - Run clippy:
cargo clippy -- -D warnings - Commit changes:
git commit -m "Add amazing feature" - Push:
git push origin feature/amazing-feature - Open a Pull Request
# Run all tests
cargo test
# Run with logging
RUST_LOG=debug cargo test
# Test specific module
cargo test extract::code::tests
# Run clippy (must pass)
cargo clippy -- -D warningsWe use DocSentinel to document the DocSentinel codebase. This ensures our own documentation remains up-to-date and verifies the tool's functionality in a real-world scenario.
This tool succeeds if developers trust it enough to run it regularly. Every design decision biases toward correctness, transparency, and respect for the user's workflow. Automation comes second. Trust comes first.