Skip to content

Latest commit

 

History

History
695 lines (468 loc) · 39.7 KB

File metadata and controls

695 lines (468 loc) · 39.7 KB

Hybrid Cognitive Runtime (HCR) v3.0 — Northstar

The Cognitive State Plane for AI-Assisted Engineering

Version: 3.0 (Northstar) Status: Strategic Product Document — grounded in current implementation Last Updated: May 27, 2026 Author: Rishi Praseeth Krishnan


0. How to Read This Document

v1.0 positioned HCR as "persistent memory + enterprise governance." v2.0 repositioned around "Cognitive State Plane" — verifiable team cognition, agent governance, causal model of engineering. Both were directionally correct.

v3.0 adds two things v2.0 could not:

  1. Implementation ground truth. Everything described here is built, tested (230 passing tests), and running. This is not a roadmap document dressed as a northstar.
  2. The CPAP differentiator. Causal Prefix Alignment Protocol — a token-economics breakthrough implemented in May 2026 that makes HCR the only memory layer with a structural prompt caching advantage. It is not a config tweak; it requires the causal graph to function. Competitors cannot retrofit it.

Executives and partners: read §1–4 and §11–13 for the full strategic picture. Engineers and architects: §5–10 for the working system in depth.


1. Executive Summary

What HCR Is

HCR is the Cognitive State Plane for AI-assisted software engineering — the infrastructure layer between coding agents (Claude Code, Cursor, Windsurf, Codex, in-house agents) and the codebase, maintaining a verifiable, governed, queryable model of engineering intent, decisions, risks, and causality across developers, agents, repositories, and time.

Three things that uniquely define HCR:

  1. Causal memory. Every fact HCR stores carries explicit causal edges — what caused it, what it caused, what it contradicts. No other memory layer for engineering has this. Causal-centrality-weighted decay ensures the "why" behind your code survives longer than the "what."

  2. CPAP: structured prompt caching. The Causal Prefix Alignment Protocol structures every context payload into three byte-stable layers, achieving ~98% LLM prompt cache hit rate. Without CPAP, every call to any memory-augmented AI tool bills at full input token cost (0% cache hits). With CPAP: ~95% cost reduction per session. CPAP requires the causal graph to produce its freeze boundaries — it cannot be bolted onto a vector store.

  3. Team and agent scope. HCR's unit of state is the team, not the individual. Every developer and every agent on a team shares one graph. Decisions made by one developer yesterday are visible to an agent's pre-flight context today, with attribution.

Why CPAP Is a Real Moat

LLM API providers (Anthropic, DeepSeek, Google) offer prompt caching: static prefixes cached server-side, reducing input token cost by up to 90% and latency by up to 10×. The requirement: the prefix must be byte-identical on every call.

Traditional memory-augmented tools achieve 0% cache hit rate because:

  • Timestamps change on every file edit
  • Relevance scores fluctuate, reordering retrieved facts
  • Dynamic time strings ("last active 4 min ago") drift continuously

CPAP solves this structurally. It sorts Layer 2 CSOs by topological depth → insertion rank → UUID (not by relevance score, not by timestamp). It freezes the layer only at git commit boundaries. The result:

Metric Traditional RAG/memory HCR + CPAP
Tokens per call ~10,000 ~2,050
Cache hit rate 0% ~98%
Effective billed tokens over 10-call session 100,000 ~4,800
Cost reduction ~95%

No other memory tool has this, because no other tool has a causal graph to derive stable freeze boundaries from.

Five Hero Outcomes (12-month horizon)

  1. <5 second context resume with >90% accuracy on inferred current task, measured by user-acceptance signal.
  2. 95%+ prompt cache hit rate on all HCR-served context payloads, measured per-session in .hcr/cpap_metrics.jsonl. Currently measured in lab: 98%.
  3. >50% reduction in agent rollback rate for teams using HCR-governed agent fleets vs. ungoverned baselines.
  4. <1 hour mean time to provenance — for any AI-generated line of code in production, surface intent, decision, author (human or agent), and review chain.
  5. Zero unaudited PHI / regulated-data exposure to LLM providers in HIPAA/SOC2-aligned deployments, via tenant-side redaction and policy enforcement.

Business Model

  • HCR Core (MIT, open source) — single developer, local-only. Free forever. Driver of adoption and developer mindshare.
  • HCR Team (SaaS, $30/dev/month) — team graph, shared decisions, agent governance, web dashboard. 12-month commit.
  • HCR Enterprise (annual, $50k–500k+) — self-hosted/VPC, RBAC/SSO/SCIM, custom redaction, SOC2 Type II audit, BAA for HIPAA.

2. The Problem, Stated Honestly

Layer 1: Context loss (commoditized — we still solve it, but not our lead)

Every major coding tool now ships a memory mechanism. This problem is being solved at the assistant-vendor layer. HCR solves it better, but leading with it is a positioning mistake.

Layer 2: Token economics collapse under memory augmentation

Adding memory to AI coding tools makes them more expensive, not less. A developer using a memory-augmented tool with 10,000 tokens of context per call, across 50 calls per day, spends $5–15/day in input tokens alone at current API pricing — vs. ~$0.50 for a session that achieves 95% cache hits.

This is the Layer 2 problem: memory tools make token costs unmanageable because they cannot maintain byte-identical prefixes. CPAP is the direct answer to this. The economics compound at scale: a 20-agent overnight fleet making 500 total API calls goes from ~$75 to ~$4 in input token cost.

Layer 3: Memory without verification

Existing tools store and retrieve. None verify. If a memory system surfaces "we decided to use Postgres in March," nothing checks whether that decision is still true, whether the codebase reflects it, or whether another agent contradicted it yesterday. Retrieval without verification is hallucination with citations.

Layer 4: Per-developer silos

Ten developers using any current memory tool build ten disconnected stores. When developer A makes an architectural decision and developer B's agent contradicts it the next day, no system catches the contradiction. There is no shared cognition.

Layer 5: Ungoverned agent fleets

Teams now run multiple coding agents overnight. Current governance is "look at the PRs and hope." No tool provides a unified plane to ask: which of these agent decisions are consistent with our team's intent, constraints, and prior commitments?

Layer 6: Compliance and provenance

Enterprise security teams are blocking AI rollouts. "Where did this code come from, what data did the model see, what was the human decision chain, can we produce a defensible audit trail?" — no current tool answers these.

HCR v3.0 leads on Layers 2, 3, 4, 5, and 6. Layer 1 is solved as a side effect.


3. Competitive Landscape

Markdown rule files (CLAUDE.md, AGENTS.md, .cursorrules)

Static text files, version-controlled, universally supported. Free. Limits: no verification, no team awareness, no agent governance, no audit, human-maintained. HCR's relation: complementary. HCR can generate and maintain these files from its graph; they become a serialization target.

Agent memory frameworks (Mem0, Zep, Letta, Cognee, Graphiti, LangMem)

Built for general-purpose chatbots. Associative retrieval, not causal. Per-user scope by default. No concept of a PR, a deployment, a regression, or a rollback. No structural prompt caching — they achieve 0% cache hit rate. HCR's relation: different category; could use one as a storage backend, but the cognitive layer sits above.

MCP-based memory servers (claude-mem, MemPalace, Memori, AgentMemory)

MCP servers exposing memory tools to any MCP-compatible client. Excellent for solo work. Limits: per-developer, retrieval-only, no verification, no team layer, no governance, no CPAP. HCR's MCP server exposes the team write path; theirs do not.

IDE-native context (Cursor Composer, Windsurf Cascade, Claude Code)

Vendor-built, deeply integrated, model-optimized. Limits: locked to vendor. Switch IDEs and context dies. No cross-tool, cross-team, or cross-agent coordination. HCR is the portable substrate — the same cognitive state works across every coding agent.

Code intelligence platforms (Augment Code, Sourcegraph Cody, Codeium)

Large-scale code indexing with AI retrieval. Excellent at "find this code." Do not model intent, decisions, causality, or agent activity. Complementary at the data layer.

Why no one has CPAP

CPAP requires three things to coexist: (a) a causal graph to derive stable freeze boundaries, (b) centrality scoring to rank which facts survive the freeze, and (c) an insertion-rank system to maintain stable sort order across rebuilds without reordering to relevance scores. Retrofitting this onto a vector store or markdown file is not a small lift — it requires the CSO data model that underpins HCR's entire architecture. Competitors would need to rebuild HCR's core to copy it.


4. Strategic Positioning

Primary Statement

For engineering teams adopting AI coding agents at scale, HCR is the Cognitive State Plane that gives every developer, every agent, and every audit a single verifiable understanding of what the team is doing and why — delivering this context at ~95% less token cost through structural prompt caching, so AI acceleration is economically sustainable at fleet scale.

Three Strategic Bets (confirmed correct as of May 2026)

  1. MCP wins as the open protocol. All major coding tools now ship MCP support. Bet confirmed.
  2. Multi-agent fleets become the norm. Teams are already running overnight agent fleets. The governance gap is acute and worsening.
  3. Regulation accelerates governance demand. EU AI Act, sector-specific US rules, and enterprise risk policies are making audit and provenance a hard requirement for AI-generated code in regulated industries.

What HCR Is Not

  • Not a coding assistant — we do not write code
  • Not a chatbot memory plugin — we are infrastructure between agents and code
  • Not a wrapper around an LLM — the symbolic and causal layers produce value without any LLM call
  • Not a replacement for git — git stores artifacts; HCR stores cognition
  • Not a productivity-metrics dashboard — we surface decisions, not rankings

5. Core Architecture

5.1 The Cognitive State Object (CSO)

The atom of HCR. Every fact, decision, constraint, risk, intent, observation, or outcome is a typed CSO:

CSO = {
  id:            globally unique content-addressed hash
  type:          DECISION | OBSERVATION | CONSTRAINT | RISK |
                 OUTCOME | CLAIM | INTENT | TASK | ROLLBACK | TRIGGER
  payload:       typed, schema-validated content
  origin:        { actor: human | agent_id, source: ide | cli | hook |
                   mcp | api | webhook, evidence: [git_ref | file_range] }
  causal_in:     [cso_id]    # what caused this
  causal_out:    [cso_id]    # what this causes / enables
  contradicts:   [cso_id]    # explicit contradictions
  confidence:    { value: float, method: heuristic | symbolic | human_attested }
  scope:         developer | team | fleet | org
  created_at, updated_at, expires_at
}

Stored in SQLite + WAL at .hcr/cso.db. Indexes on type, created_at, scope.

5.2 The Five-Layer Stack

┌─────────────────────────────────────────────────────────────────┐
│ Layer 5: Experience Surfaces                                    │
│   IDE plugins · MCP server (23 tools) · Web console · CLI · API │
├─────────────────────────────────────────────────────────────────┤
│ Layer 4: Governance & Policy                                    │
│   JWT/Bearer auth · RBAC · Redaction · Audit · Compliance       │
├─────────────────────────────────────────────────────────────────┤
│ Layer 3: Reasoning Engine                                       │
│   Symbolic Verifier · Causal Reasoner (BFS) · CPAP Formatter   │
│   CognitiveProjection (centrality + decay) · EmbeddingStore    │
│   RRF + MMR Fusion · FeedbackStore (learned weights)            │
├─────────────────────────────────────────────────────────────────┤
│ Layer 2: Cognitive State Fabric                                 │
│   CSO Store (SQLite + WAL) · Embedding Store (sqlite-vec)       │
│   Causal Graph Index · FreezeStore (CPAP epoch) · BOCPD         │
├─────────────────────────────────────────────────────────────────┤
│ Layer 1: Capture & Signals                                      │
│   Git hooks (post-commit → CPAP freeze) · File watcher          │
│   MCP tool calls · IDE telemetry · Agent traces · REST API      │
└─────────────────────────────────────────────────────────────────┘

5.3 CPAP: Causal Prefix Alignment Protocol

CPAP is HCR's answer to the token economics problem. It structures every context payload into three strict, mutation-gated layers designed for LLM prompt caching:

┌──────────────────────────────────────────────┐
│ Layer 1: C_static  (~500 tokens)             │  ← 100% cached
│ Project identity, HCR version, fixed rules   │
├──────────────────────────────────────────────┤
│ Layer 2: C_semi  (~1,500 tokens)             │  ← ~98% cached
│ Top-20 centrality CSOs, frozen at git commit │
│ Sort: topological depth → insertion rank     │
│       → UUID tiebreaker                      │
├──────────────────────────────────────────────┤
│ Layer 3: Δ_dynamic  (<50 tokens)             │  ← active compute
│ Compact AST delta of most recent file edit   │
│ Format: Δ {file} M{fn} I+{import}            │
└──────────────────────────────────────────────┘

Freeze gate logic. Layer 2 rebuilds only when:

  • git post-commit hook writes .hcr/freeze_requested (installed by hcr init)
  • hcr_freeze MCP tool called manually
  • First run (no epoch file exists)
  • git HEAD diverges from stored epoch

Between triggers: Layer 2 bytes are read directly from .hcr/cpap_epoch.json — zero compute, byte-identical.

Stable sort. insertion_ranks dict carried forward across rebuilds. New CSOs get a monotonically increasing rank on first appearance; existing CSOs keep their rank. Sort key: (topo_depth, insertion_rank, cso_id) — deterministic without relying on timestamps or scores.

Layer 3 micro-format. Pulled from the most recent OBSERVATION CSO for active_file:

Δ auth.py:42 Mvalidate_token I+pyjwt I-basic_auth

Codes: A = added, R = removed, M = modified, S = signature changed. I+/-module for import changes. Always under 50 tokens.

Telemetry. Every format() call appends to .hcr/cpap_metrics.jsonl. hcr_get_system_health reports cpap_stats: hit rate, calls today, busts today, avg Layer 3 tokens.

5.4 Cognitive State Fabric (CSF)

The intelligence layer over raw CSO storage:

  • centrality.pyCausalCentralityScorer: BFS transitive reachability on causal edges. CSOs that caused the most downstream effects score highest.
  • projection.pyCognitiveProjection: centrality-ranked, decay-filtered live state. Called by CPAP and directly on tool calls.
  • prefetch.pyProjectionPrefetcher: background thread triggered on file edit events. Caches projection so the next tool call has zero compute latency.
  • embedding_store.pyEmbeddingStore: sqlite-vec ANN store. Embeds qualifying CSO tiers via Ollama nomic-embed-text, falling back to sentence-transformers.
  • implicit_graph.pygenerate_soft_links: semantic k-NN to auto-detect soft causal edges (similarity > 0.82).
  • episode_store.pyBOCPDSegmenter: Bayesian Online Changepoint Detection for segmenting event streams into work episodes.
  • fusion.py — RRF + MMR: fuse semantic + causal ranked lists; select diverse results for fixed token budget.
  • feedback.pyFeedbackStore: learnable RRF weights trained after 50 labelled samples.
  • cpap.pyCPAPFormatter, FreezeStore, CPAPPayload, compute_cpap_stats.

5.5 Auth & Sync Infrastructure

Implemented in Plans 10 (complete):

  • GitHub OAuth (/api/auth/github/callback): Issues JWT access + refresh tokens from GitHub user info
  • Bearer token middleware (AuthMiddleware): Protects all non-public API endpoints; 401 with hcr auth login guidance on failure
  • JWT handler (hcr/product/auth/jwt_handler.py): encode_token, decode_token, is_expiring_soon
  • Token store (hcr/product/auth/token_store.py): save_token, load_token, clear_token at ~/.hcr/auth.json
  • Token refresh (hcr/product/auth/refresh.py): maybe_refresh() — proactive refresh before expiry, called by require_auth() in CLI
  • Telemetry endpoint (/api/telemetry): Per-tool SQLite audit log; fire-and-forget from MCP side with offline queue + HMAC signing
  • CSO sync endpoints (/api/projects/{id}/csos): GET (list) + POST (create); MCP-side poll_once merges remote CSOs into local SQLite
  • CLI auth commands (hcr auth login/logout/whoami): Local HTTP server for OAuth callback, stores tokens
  • MCP JWT enforcement: _handle_initialize validates JWT when HCR_JWT_SECRET is set (bypassed by MCP_DEV_MODE)

5.6 MCP Server

23 tools (consolidated from 31 in Plan 9), organized in hcr/product/integrations/tools/:

Category Tools
State / Context hcr_get_state, hcr_preflight, hcr_postflight, hcr_get_system_health
Write path hcr_remember, hcr_record_file_edit, hcr_fail, hcr_resolve, hcr_set_trigger
Analysis hcr_analyze_impact, hcr_get_recommendations, hcr_get_version_history, hcr_search_history
Session hcr_create_session, hcr_list_sessions, hcr_merge_session, hcr_set_session_note
Decisions hcr_read_decisions
Cross-project hcr_share_state, hcr_get_shared_state, hcr_list_shared_states
Ops hcr_restore_version, hcr_get_recent_activity, hcr_freeze

All imports into mcp_server.py are at module level (no lazy imports inside functions).

5.7 REST API

FastAPI server (hcr/product/api/main.py) with:

  • POST /api/auth/github/callback — OAuth token exchange
  • POST /api/auth/refresh — token refresh
  • GET /api/projects/{id}/preflight — returns CPAPPayload as JSON
  • GET/POST /api/projects/{id}/csos — CSO sync endpoints
  • POST /api/telemetry — per-tool telemetry audit log
  • GET /health — health check (unprotected)
  • Memory API (/api/memory/*) — for ChatGPT Custom GPT Actions and REST clients

5.8 Project Layout

hcr/
  engine/
    cso/cso_model.py, cso_store.py, agent_registry.py
    memory/centrality.py, projection.py, prefetch.py, embedding_store.py,
           implicit_graph.py, episode_store.py, fusion.py, feedback.py,
           prospective.py, cpap.py
    symbolic/verifier.py, rules.py
    engine_api.py
  product/
    api/main.py, auth.py, middleware.py, preflight.py, csos.py,
        telemetry.py, memory.py, apikeys.py
    auth/jwt_handler.py, token_store.py, refresh.py
    cli/main.py, auth_cmd.py
    integrations/mcp_server.py, mcp_server_stdio.py,
                  telemetry_client.py, tools/
    storage/semantic_decay.py
    sync/poller.py
  install/post-commit

tests/  (230 passing, 4 skipped — skipped require live LLM)

web/web-ui/  (React + ReactFlow dashboard)

.hcr/
  cso.db          # CSO store (SQLite + WAL)
  embeddings.db   # sqlite-vec ANN index
  cpap_epoch.json # CPAP freeze state
  cpap_metrics.jsonl  # CPAP telemetry
  feedback.db     # learned RRF weights
  auth.json       # CLI token store (~/.hcr/)

6. Feature Set

6.1 Foundations (complete and deployed)

F1. Resume in <5 seconds. Symbolic-first inference (fast, no LLM required). Current task, progress, next action, relevant decisions, open risks.

F2. CPAP-structured context. Every hcr_preflight and hcr_get_state call returns a three-layer payload achieving ~98% prompt cache hit rate. cache_epoch field lets callers detect stale contexts. cache_hit: true means Layer 2 served from epoch with zero compute.

F3. Cross-tool memory. One graph, exposed via MCP to Claude Code, Cursor, Windsurf, Codex, ChatGPT (Custom GPT Actions), and any REST client. Switching tools does not lose context.

F4. Markdown round-trip. HCR generates and maintains CLAUDE.md / AGENTS.md from the canonical graph. The markdown file is no longer the source of truth, but it remains a first-class export.

F5. Local-first by default. HCR Core runs entirely on the developer's machine, including when using local LLMs (Ollama). Cloud features are opt-in.

F6. Auth and sync. GitHub OAuth → JWT. Bearer token middleware on all API endpoints. CLI hcr auth login/logout/whoami. Per-tool telemetry with offline queue + HMAC signing. CSO sync: poll_once merges remote CSOs into local SQLite for multi-device and multi-developer sync.

6.2 Verification (complete for single-developer; team scope in progress)

V1. Decision provenance. Every architectural decision is a first-class CSO with author, date, evidence, files in scope.

V2. Constraint enforcement. Symbolic verifier runs rules against every new CSO. Violations create RISK CSOs.

V3. Contradiction detection. Explicit contradicts edges between CSOs. Surfaced on preflight for agents.

V4. Forward impact simulation. hcr_analyze_impact — causal BFS + semantic RRF fusion. Returns predicted blast radius for any file or proposed change.

V5. Backward attribution. Traverse causal_in edges from any outcome to contributing decisions, intents, and actors.

V6. Re-attestation. Stale decisions flagged by decay; confidence decays on half-life schedule.

6.3 Team and Agent Scope (in progress)

T1. Agent registry. agent_registry.py registers agents with identity, role, and autonomy budget.

T2. Pre-flight / post-flight lifecycle. hcr_preflight records retrieval for learned fusion; hcr_postflight records outcome signal. Used by FeedbackStore to train RRF weights.

T3. Cross-project state sharing. hcr_share_state / hcr_get_shared_state expose decisions across projects for the same developer.

T4. Trigger CSOs. hcr_set_trigger creates TRIGGER CSOs that fire when an agent opens a matching file. Injected at rank-0 by CognitiveProjection regardless of centrality.

T5. Episode segmentation. BOCPDSegmenter partitions event streams into work episodes for better context boundaries.

6.4 Governance and Compliance (foundations complete; enterprise tier roadmap)

G1. JWT + Bearer token auth. Middleware-enforced on all API endpoints. MCP JWT enforcement when HCR_JWT_SECRET set.

G2. Per-tool telemetry audit. Every MCP tool call timed, signed, and logged to SQLite via /api/telemetry. Offline queue with HMAC for disconnected operation.

G3. Tenant-side redaction. Configurable redaction rules strip secrets and PII before CSOs leave tenant boundary. (Enterprise tier.)

G4. Policy engine. Symbolic rules gate agent autonomy at the CSO level. (Enterprise tier.)

G5. Compliance-aligned controls. Designed for SOC2, ISO 27001, HIPAA, GDPR. Audit roadmap with dates in §12.


7. User Experience

7.1 MCP (primary integration surface)

python -m hcr.product.integrations.mcp_server_stdio — stdio transport, compatible with Cursor, Claude Code, Windsurf, and any MCP client.

Pre-flight output structure (CPAP-formatted):

=== HCR Context (epoch: a3f9c2b1, CACHED) ===

## System Context
[static rules and project identity — ~500 tokens — 100% cached]

## Project Memory
[D] Use FastAPI for REST layer (scope: arch)
  → caused: [O] REST endpoints live at /api/...
[C] All imports in mcp_server.py must be at module level
[R] SQLite WAL may block under concurrent writes
... (top-20 by centrality — ~1,500 tokens — ~98% cached)

## Current Focus
Δ auth.py:42 Mvalidate_token I+pyjwt    (< 50 tokens — always fresh)

7.2 REST API

FastAPI at http://localhost:8080 (or deployed to Render/cloud). Auth: Bearer token from hcr auth login.

GET /api/projects/{id}/preflight returns full CPAPPayload JSON for direct API callers. Callers can inspect cache_hit: true to skip re-sending layer2_stable in their own cache block.

ChatGPT Custom GPT Actions: /api/memory/* endpoints with API key auth.

7.3 CLI

hcr init              # initialize .hcr/, install CPAP git hook
hcr status            # show cognitive state summary
hcr resume            # full resume context
hcr freeze            # manual CPAP epoch rebuild (-p for project path)
hcr auth login        # GitHub OAuth → saves ~/.hcr/auth.json
hcr auth logout       # clear token
hcr auth whoami       # print current identity
hcr doctor            # system health check
hcr dashboard         # launch web UI

7.4 Web Console

React + ReactFlow dashboard at web/web-ui/. Modes:

  • Live causal graph with centrality-sized nodes
  • State history timeline (git-like)
  • System health monitoring with CPAP hit rate
  • CPAP metrics (hit rate, calls today, avg Layer 3 tokens)

7.5 IDE Plugin

VS Code extension (basic). Side panel shows current task, relevant decisions, open risks. Status bar with confidence score.


8. Reference Workflows

8.1 Monday morning resume

Developer opens IDE. MCP pre-flight fires. CPAP serves Layer 2 from epoch file (zero compute, 0ms). Layer 3 shows last file edit. Total context: ~2,050 tokens. Cache hit — billed at ~90% discount vs. yesterday's first call. Developer reads: current task, three relevant decisions, one open risk, recommended next action. Clicks Continue. No typing.

8.2 Architectural decision — captured automatically

Developer commits code. Post-commit hook writes .hcr/freeze_requested. On next IDE interaction, CPAPFormatter detects the flag, calls CognitiveProjection.compute(), picks new top-20 CSOs by centrality × decay, persists new epoch. New cache_epoch returned in preflight — downstream clients know context changed. Old epoch invalidated. Layer 2 bytes stable again for all subsequent calls until next commit.

8.3 Overnight agent fleet

Staff engineer authorizes five Claude Code agents. Each agent runs hcr_preflight at startup — receives CPAP-structured context with team decisions, constraints, and active risks. Layer 2 is byte-identical across all five agents' first calls (cache hit on runs 2–5). Agents run. On post-flight, hcr_postflight ingests their produced CSOs. Symbolic verifier evaluates against DEFAULT_RULES. Two agents triggered risk CSOs (dependency added without RFC; file touched by an active decision). They route to human review. Staff engineer approves via dashboard. Three agents' changes are clean.

8.4 Post-incident attribution

Production regression. On-call types hcr analyze_impact auth.py --direction backward. HCR returns: the commit, the agent, the pre-flight context the agent received, the two decisions in scope, and the constraint that should have caught it. Post-mortem writes itself.

8.5 Token cost review

Platform team runs hcr_get_system_health. Response includes:

"cpap_stats": {
  "hit_rate": 0.97,
  "calls_today": 143,
  "busts_today": 4,
  "avg_layer3_tokens": 16
}

97% cache hit rate. At 143 calls/day across the team, CPAP saved ~$18 in input token costs today vs. the uncached baseline.


9. Implementation Status

Complete ✅

Foundation (Plans 1–8):

  • CSO data model and graph engine (SQLite + WAL, WAL indexes)
  • Symbolic verifier with declarative rule engine
  • Causal reasoner with BFS transitive reachability and forward/backward analysis
  • Git hooks, file watcher, MCP tool calls, IDE telemetry (capture layer)

Cognitive State Fabric (Plan 9):

  • CognitiveProjection (centrality-ranked, decay-filtered)
  • ProjectionPrefetcher (background cache)
  • EmbeddingStore (sqlite-vec + Ollama/sentence-transformers)
  • BOCPDSegmenter (episode segmentation)
  • RRF + MMR fusion with learnable FeedbackStore weights
  • generate_soft_links (semantic k-NN soft edges)
  • get_triggered_csos (prospective memory)

Auth, Telemetry, Sync (Plan 10):

  • GitHub OAuth callback + JWT issuance
  • Bearer token middleware (all API endpoints protected)
  • CLI hcr auth login/logout/whoami
  • MCP JWT enforcement at initialize
  • Web frontend: token persistence + ProtectedRoute guard
  • Per-tool telemetry client with offline HMAC queue
  • /api/telemetry endpoint with SQLite audit log
  • CSO sync endpoints (GET/POST /api/projects/{id}/csos)
  • poll_once poller for CLI sync
  • maybe_refresh() token refresh before expiry
  • REST /api/projects/{id}/preflight returning CPAPPayload

CPAP (May 27, 2026):

  • CPAPPayload, FreezeStore, compute_cpap_stats (cpap.py)
  • CPAPFormatter with freeze gate, stable sort, Layer 3 delta, telemetry
  • hcr/install/post-commit git hook
  • hcr_freeze MCP tool
  • CPAP wired into hcr_preflight and hcr_get_state
  • cpap_stats in hcr_get_system_health
  • REST /api/projects/{id}/preflight returns CPAPPayload
  • hcr freeze CLI subcommand; hcr init installs CPAP hook

Test coverage: 230 passing, 4 skipped (require live LLM), ~70s full suite.

MCP tools: 23 tools (consolidated from 31).


In Progress 🏗️

  • Agent registry and autonomy gatesagent_registry.py exists; policy-enforced gates not yet wired to all tool calls
  • Team graph — CSO store is per-project; multi-developer sync exists via REST but team-scope shared graph not yet first-class
  • Fleet dashboard — web console shows single-project state; fleet-wide agent view pending
  • CPAP cache control blocks — REST callers receive CPAPPayload JSON; upstream injection of Anthropic cache_control blocks is caller-side

Roadmap: 3–6 months

  • Policy engine integration (gate agent actions against declared team constraints)
  • Postgres backend and multi-region deployment (Team SaaS)
  • Full web console (Map, Feed, Decision, Audit, Fleet modes)
  • JetBrains and Neovim plugins
  • Slack / Linear / Jira webhooks
  • HCR Team SaaS beta (6 design partners)
  • CPAP: upstream cache_control block injection for Anthropic API callers
  • Agent fleet dashboard (live view, risk scores, gate states)

Roadmap: 6–12 months

  • VPC / self-hosted deployment
  • SCIM, SSO, custom RBAC
  • Per-org KMS, BAA-eligible deployments
  • SOC2 Type II audit window open (Month 6–12 GA)
  • HCR Team GA (Month 9)
  • HCR Enterprise GA (Month 12)

Roadmap: 12–24 months

  • HCR rule language v1.0 (Datalog-inspired, documented, open source)
  • Marketplace for third-party rule packs and CSO type extensions
  • Time-travel debugging (replay team cognitive state at any past moment)
  • Causal reasoner upgrade to symbol-level granularity (method/function, not just file)
  • Voice/meeting capture (opt-in, redactable)
  • Cross-org federation for open-source projects
  • Mobile read-only app for engineering managers

10. Measurement Plan

We publish measured numbers from deployments, not projected industry statistics.

Core metrics

Metric Target Current (lab)
Time to first productive action <5 seconds <5 seconds (symbolic-first path)
CPAP Layer 2 cache hit rate >95% ~98% (230-call test session)
CPAP avg Layer 3 tokens <50 ~16
Session resume accuracy >90% (user acceptance) In beta testing
Agent rollback rate (HCR vs. ungoverned) 50% reduction Baseline being measured
Mean time to provenance <1 hour <5 min (local store, graph traversal)

Token economics (per-team)

Published quarterly from design-partner deployments. Baseline: uncached full-context calls at current API pricing. HCR + CPAP target: 95% reduction in billed input tokens per session.

Honesty principles

Negative results published. If CPAP cache hit rate drops below 90% in a quarter, we say so and explain why. If symbolic verification produces >10% false-positive rate, we publish that.


11. Competitive Defensibility

Three moats

  1. CPAP requires the causal graph. Layer 2 freeze boundaries are derived from causal centrality. A competitor cannot implement CPAP without first building the entire CSO + causal edge + centrality scoring stack. This is a 6–12 month rebuild, not a feature flag.

  2. Network effect at the team layer. Every developer on a team who uses HCR improves every other developer's context quality. Switching cost is the team's accumulated decision graph. Switching individually is possible; switching as a team is not.

  3. Integration breadth on MCP. Being the canonical write target for every coding agent on the team makes HCR sticky in a way single-vendor tools cannot match. The same graph works across Claude Code, Cursor, Windsurf, and any in-house agent.

What we will not compete on

  • Best autocomplete — model vendors own this
  • Cheapest memory store — we are not a memory store
  • Biggest context window — CPAP makes the window unnecessary

12. Risks

Risk Likelihood Impact Mitigation
LLM vendors add native team memory Medium High Lead with causal verification + CPAP; they cannot replicate either without rebuilding HCR's architecture
CPAP cache hit rate degrades on large teams with frequent commits Medium Medium Tune Layer 2 budget; adaptive freeze threshold; publish measured rates
MCP loses to a competing protocol Low Medium Protocol-agnostic core; adapters for any emerging protocol
Enterprise sales cycles starve early-stage runway High High Open-core acquisition; land-and-expand; partner-led distribution
Symbolic verification produces too many false positives Medium High Ship with conservative rules first; user-tuneable thresholds; publish false-positive rates
Privacy backlash on capture features Medium High Local-first default; explicit opt-in for every cloud feature; tenant-side redaction
LLM cost collapse eliminates CPAP's economic argument Low Medium CPAP also reduces latency 10×; latency argument persists regardless of cost

13. Conclusion

HCR v3.0 is the same bet as v2.0 — the Cognitive State Plane is the right category — with three additions:

  1. CPAP changes the economics of memory-augmented AI tools. 95% token cost reduction is not a marginal improvement; it makes memory augmentation economically viable at fleet scale, where it was previously prohibitive.

  2. The system is built. 230 passing tests. 23 MCP tools. Full auth/sync/telemetry stack. REST API with ChatGPT integration. CPAP wired into every context-serving tool. This is infrastructure, not a prototype.

  3. The moat is now technical, not just strategic. CPAP requires the causal graph. The causal graph requires the CSO data model. The CSO data model requires HCR's capture layer. Competitors cannot replicate the user-visible outcome (95% token reduction) without rebuilding the entire stack from Layer 1.

The next phase — team graph, agent fleet governance, SOC2 audit, enterprise GTM — is where HCR becomes the control plane for AI-augmented engineering. The foundation for all of it is built.


Appendix A — CPAP Technical Detail

For engineers evaluating CPAP:

Why UUID sort as tiebreaker, not timestamp? Timestamps change as CSOs are accessed and updated. UUIDs are content-addressed and immutable. Using UUID as a final tiebreaker guarantees byte identity even if two CSOs have identical topological depth and insertion rank (which should not happen in practice, but is possible in edge cases).

Why topological depth first? DECISION CSOs that are causally upstream of many OUTCOME and OBSERVATION CSOs should appear before their descendants in the context. This preserves the narrative structure — the model reads "we decided X" before "X caused Y" — which improves reasoning quality on causal chains.

Why insertion_ranks carried forward across rebuilds? The alternative is re-sorting by centrality score on every rebuild. But centrality scores shift as new CSOs are written (changing the graph topology). A CSO that ranked 5th yesterday might rank 12th today after an unrelated commit. If that reordering changed Layer 2 bytes, the cache would bust even though neither CSO's content changed. Insertion ranks provide a stable "narrative position" that survives graph topology changes.

What causes a cache bust? A git commit (post-commit hook fires). The new commit may have changed which CSOs are most central, so Layer 2 must rebuild to reflect the new top-20. This is correct — after a commit, context should update. Between commits, the context is stable and achieves near-perfect cache hit rate.


Appendix B — Glossary

CSO (Cognitive State Object). Typed, signed, causally-linked record representing an intent, decision, observation, constraint, risk, outcome, claim, task, or rollback.

CPAP (Causal Prefix Alignment Protocol). Three-layer context serialization that achieves ~98% LLM prompt cache hit rate. Requires causal graph for freeze boundaries.

Cognitive State Plane. HCR's category: verifiable, governed, team-scope infrastructure layer between coding agents and the codebase.

CognitiveProjection. Centrality-ranked, decay-filtered live state view. Input to CPAP Layer 2 selection.

CPAPFormatter. Produces CPAPPayload from CSOStore. Manages freeze gate, epoch persistence, Layer 3 delta extraction, and telemetry.

FreezeStore. Reads/writes .hcr/cpap_epoch.json — the persisted Layer 2 epoch across process restarts.

Cache epoch. SHA256[:8] of layer2_bytes. Callers use this to detect context staleness.

Layer 3 delta. Compact AST-diff representation of most recent file edit. Always <50 tokens. Format: Δ {file} M{fn} I+{import}.

Symbolic Verifier. Rules engine over the typed CSO graph. Emits RISK CSOs when rules fire.

Causal Reasoner. Maintains and traverses causal links between CSOs. Forward impact simulation + backward attribution.

Pre-flight / Post-flight. Structured context handoff to an agent before it begins work (hcr_preflight) and structured reconciliation of its produced CSOs afterward (hcr_postflight).

BOCPD (Bayesian Online Changepoint Detection). Statistical method for detecting work episode boundaries in event streams.

RRF (Reciprocal Rank Fusion). Fusion algorithm combining causal-centrality and semantic ranked lists for fixed token budget.

MCP (Model Context Protocol). Open protocol for tool and resource access between AI agents and external systems. HCR is MCP-native.


This is a living product document. Implementation status reflects the codebase state as of May 27, 2026. Version 3.0 — Integration of v1.0 governance positioning, v2.0 Cognitive State Plane strategy, and CPAP token-economics breakthrough.