|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Commands |
| 6 | + |
| 7 | +```bash |
| 8 | +# Run all tests |
| 9 | +python -m pytest tests/ -q |
| 10 | + |
| 11 | +# Run a single test file |
| 12 | +python -m pytest tests/test_cso_store.py -v |
| 13 | + |
| 14 | +# Run a single test by name |
| 15 | +python -m pytest tests/test_cso_store.py::test_write_and_query -v |
| 16 | + |
| 17 | +# Start MCP server (stdio transport, used by Cursor/Claude/Windsurf) |
| 18 | +python -m hcr.product.integrations.mcp_server_stdio |
| 19 | + |
| 20 | +# CLI |
| 21 | +hcr init # initialize .hcr/ in current project |
| 22 | +hcr status # show cognitive state summary |
| 23 | +hcr resume # show resume context |
| 24 | +``` |
| 25 | + |
| 26 | +## Architecture |
| 27 | + |
| 28 | +HCR is a persistent developer memory layer exposed as an MCP server. The key insight: AI tools are stateless, so HCR provides the stateful substrate they lack. |
| 29 | + |
| 30 | +### Storage: CSOs (Cognitive State Objects) |
| 31 | + |
| 32 | +`hcr/engine/cso/` is the core data layer. A `CSO` (`cso_model.py`) is a typed, causally-linked record — types include `DECISION`, `OBSERVATION`, `CONSTRAINT`, `RISK`, `TASK`, `OUTCOME`, `CLAIM`, `INTENT`, `ROLLBACK`, `TRIGGER`. Each CSO has explicit `causal_in`/`causal_out` edge lists forming a directed causal graph. `CSOStore` (`cso_store.py`) persists them in SQLite + WAL at `.hcr/cso.db`. Indexes on `type`, `created_at`, and `scope`. |
| 33 | + |
| 34 | +### Memory Fabric: `hcr/engine/memory/` |
| 35 | + |
| 36 | +The Cognitive State Fabric (CSF) — the intelligence layer over raw CSO storage: |
| 37 | + |
| 38 | +- **`centrality.py`** — `CausalCentralityScorer`: BFS transitive reachability on causal edges. CSOs that caused the most downstream effects score highest (0–1). |
| 39 | +- **`projection.py`** — `CognitiveProjection`: replaces naive `facts[-10:]` with centrality-ranked, decay-filtered live state. Called on every `hcr_get_state` and `hcr_preflight`. |
| 40 | +- **`prefetch.py`** — `ProjectionPrefetcher`: background thread triggered on file edit events. Caches `CognitiveProjection` so the next tool call hits cache (zero compute latency). |
| 41 | +- **`embedding_store.py`** — `EmbeddingStore`: sqlite-vec ANN store at `.hcr/embeddings.db`. Embeds qualifying CSO tiers (`commit/task/decision/constraint/risk/edited`) via Ollama `nomic-embed-text`, falling back to `sentence-transformers`. Only CSOs in those tiers are embedded. |
| 42 | +- **`implicit_graph.py`** — `generate_soft_links`: semantic k-NN to auto-detect soft causal edges (similarity > 0.82 threshold). |
| 43 | +- **`episode_store.py`** — `BOCPDSegmenter`: Bayesian Online Changepoint Detection for segmenting event streams into work episodes. |
| 44 | +- **`fusion.py`** — `reciprocal_rank_fusion` (RRF, accepts optional learned `weights` tuple) and `mmr_select` (Maximum Marginal Relevance): fuse semantic + causal ranked lists; select diverse results for fixed token budget. RRF is called by `cso_impact.query_cso_impact` when an `embedding_store` is passed. |
| 45 | +- **`implicit_graph.py`** — `generate_soft_links`: called in `_handle_file_edit` after embedding; auto-detects semantic soft causal edges (similarity > 0.82) and back-writes them to the CSO. |
| 46 | +- **`prospective.py`** — `get_triggered_csos`: returns `TRIGGER` CSOs matching the active file pattern; injected at rank-0 by `CognitiveProjection`. |
| 47 | +- **`feedback.py`** — `FeedbackStore`: SQLite store recording preflight retrievals and session outcomes. After `TRAINING_THRESHOLD=50` labelled samples, trains learned RRF weights via least-squares. `HCREngine._feedback_store` owns the instance. |
| 48 | + |
| 49 | +### Engine: `hcr/engine/engine_api.py` |
| 50 | + |
| 51 | +`HCREngine` is the central object. Key responsibilities: |
| 52 | +- Owns `_cso_store`, `_prefetcher`, `_embedding_store`, `_feedback_store`, `_episode_segmenter` |
| 53 | +- `_handle_file_edit()` is the hot path: writes OBSERVATION CSO → symbolic verifier → prefetcher → embeds CSO → generates soft-links (back-written to CSO causal_in) |
| 54 | +- Parallel legacy state in `CognitiveState` (symbolic facts, causal graph) is kept for backwards compatibility but CSOs are the v2.0 source of truth |
| 55 | + |
| 56 | +### Symbolic Reasoning: `hcr/engine/symbolic/` |
| 57 | + |
| 58 | +`SymbolicVerifier` (`verifier.py`) evaluates `DEFAULT_RULES` against a newly written CSO and emits RISK CSOs when rules fire. Rules live in `rules.py`. Rule evaluation failures are logged at DEBUG level (not silently swallowed). |
| 59 | + |
| 60 | +### Semantic Decay: `hcr/product/storage/semantic_decay.py` |
| 61 | + |
| 62 | +Tier-based fact retention. Each fact prefix (`commit:`, `task:`, `edited:`, `error:`, `cmd:`, `mcp_tool:`, `pattern:`, `observation:`) has a half-life (7 days → 15 min). `CognitiveProjection` uses this to compute effective half-life adjusted by centrality: `effective_hl = base / max(1 - centrality*0.8, 0.2)` — high-centrality CSOs decay slower. |
| 63 | + |
| 64 | +### MCP Server: `hcr/product/integrations/` |
| 65 | + |
| 66 | +`HCRMCPResponder` in `mcp_server.py` dispatches all MCP tool calls to modular `BaseMCPTool` subclasses in `tools/`. Each tool class is ~one file. Key tools: |
| 67 | +- `hcr_get_state` / `hcr_preflight` — main context-handoff tools for agents |
| 68 | +- `hcr_preflight` / `hcr_postflight` — agent lifecycle: preflight records retrieval for learned fusion; postflight records outcome signal |
| 69 | +- `hcr_record_file_edit`, `hcr_remember`, `hcr_fail`, `hcr_resolve` — write-path tools |
| 70 | +- `hcr_analyze_impact` — causal BFS + RRF semantic fusion (`cso_impact.py` + `embedding_store`) |
| 71 | + |
| 72 | +All imports into `mcp_server.py` must be at module level (not inside functions/conditionals). |
| 73 | + |
| 74 | +### Project State on Disk |
| 75 | + |
| 76 | +``` |
| 77 | +.hcr/ |
| 78 | + cso.db # SQLite CSO store (WAL mode, indexes: type/created_at/scope) |
| 79 | + embeddings.db # sqlite-vec embedding store (WAL, synchronous=NORMAL) |
| 80 | + feedback.db # learned fusion weights (FeedbackStore) |
| 81 | + agents.db # AgentRegistry |
| 82 | + state.json # legacy CognitiveState (required for init check) |
| 83 | + decisions/ # legacy JSONL decision log (migrated to CSOs on init) |
| 84 | +``` |
| 85 | + |
| 86 | +## Tests |
| 87 | + |
| 88 | +`tests/conftest.py` has `collect_ignore` for non-pytest scripts. Tests are fully synchronous where possible; async tests use `pytest-asyncio` with `asyncio_mode = "auto"`. Mock `_get_embedding` on `EmbeddingStore` to avoid needing Ollama running. The full suite runs in ~57 seconds (179 tests, 4 skipped — skipped tests require a live LLM endpoint). |
0 commit comments