Add metrics framework, setup automation, and context router hardening by stbiadmin · Pull Request #16 · GMaN1911/claude-cognitive

stbiadmin · 2026-02-28T23:27:26Z

Summary

This PR adds three things that I believe claude-cognitive needs in order to deliver on its claims:

A metrics framework that can be used to determine whether the system actually reduces token usage, or at least is providing meaningful context injection and routing.
An interactive setup skill that replaces the tedius manual setup with a 5-minute guided workflow
10 bug fixes in the context router code, including two critical ones (Windows breakage, stdin corruption)

The existing context routing and pool coordination code is solid. These additions make the system usable by people who didn't write it, and provable by people who want evidence it works.

Motivation

I set up claude-cognitive on a real project and hit these issues:

Default keywords only match the author's project. Running the router with generic prompts ("How does the API work?") produces zero activations. A new user who doesn't create keywords.json gets a system that silently does nothing.
Setup requires deep knowledge of the internals. Path resolution, fractal doc format, keyword mapping, co-activation rules, hook wiring - each requires reading source code to understand.
No way to verify it's working. The README claims 64-95% token savings but ships no tooling to measure this. The usage tracker exists but isn't wired into hooks.
Silent failures everywhere. When keywords don't match, the router outputs nothing. When keywords.json is malformed, it falls back silently. Users can't tell "working, nothing to inject" from "completely broken."

What changed

Metrics framework (`scripts/metrics/`, 5 files, ~2,800 lines)

JSONL-based event collection that hooks into the existing UserPromptSubmit, SessionStart, and Stop lifecycle. Captures per-turn injection size, keyword matches, attention tier distribution, and transition data.

The analyzer computes: token savings statistics (mean/median/percentiles), keyword hit rates, attention dynamics, coverage gaps (which docs never activate), and trends over time.

Reports use a practical three-scenario framing:

Baseline (CLAUDE.md only) - what you get without cognitive
With cognitive (baseline + targeted injection) - what the router provides
Dump everything (all docs) - the naive alternative

This replaces the original "99.9% savings" metric, which compared against a baseline nobody would/could actually use.

Setup automation (`/cognitive-setup` skill, `install.sh`)

A 6-phase skill that:

Checks environment (Python version, existing config)
Scans the codebase (language detection, module discovery, framework identification)
Generates keywords.json from the analysis
Creates fractal documentation stubs with proper <\!-- WARM CONTEXT ENDS --> markers
Verifies hook configuration
Runs dry-run validation against project-relevant prompts

Each phase presents its output for review before writing files. The whole thing is idempotent.

install.sh handles the mechanical parts (copy scripts, merge hooks into settings.json, install skills).

Context router hardening (`scripts/context-router-v2.py`, +261 lines)

Bug fixes:

Bug	Severity
`import fcntl` crashes on Windows	Critical
stdin read twice in except branch (data already consumed)	Critical
`pinned` config parsed but variable unused	Major
`--diagnostics` runs after `save_state()`, mutating state	Major
Session state file race between concurrent sessions	Major
`datetime.utcnow()` deprecated since Python 3.12	Major
Threshold constants duplicated across files	Major
`save_report()` name parameter unsanitized (path traversal)	Major
`argparse --help` exits and kills hook process	Minor
`_read_last_router_output()` reads entire log file	Minor

New capabilities:

--validate "prompt" for dry-run testing without state mutation
--diagnostics for JSON diagnostic output
Non-silent failure messages on first 3 turns when nothing activates
Project-local .claude/ always preferred over global ~/.claude/

Supporting additions

/cognitive-status skill for health checks (file presence, hook config, attention state)
/cognitive-state skill and standalone cognitive-state.py script for checking attention without burning context tokens
/cognitive-metrics skill for interactive analysis

What did NOT change

No changes to the pool coordination scripts (pool-loader.py, pool-extractor.py, pool-auto-update.py, pool-query.py)
No changes to the hook contract (stdin JSON in, stdout text out)
No changes to the .claude/settings.json schema
No new runtime dependencies beyond Python stdlib
All changes are additive. Existing configurations continue to work without modification.

How to review

There's a lot of changes. I recommend that you focus on these in order:

scripts/context-router-v2.py - the bug fixes (search for HAS_FCNTL, raw = sys.stdin.read(), pinned)
scripts/metrics/collector.py - how per-turn data is captured
scripts/metrics/analyzer.py - the analysis logic
.claude/skills/cognitive-setup/SKILL.md - the setup workflow instructions

These can be skimmed or skipped:

.claude/skills/*/SKILL.md (other than setup) - skill definitions (natural language instructions)
templates/ - example configurations

Testing

Verified by running against the claude-cognitive repo itself:

Design decisions

Why JSONL for metrics storage? Consistency with the existing attention_history.jsonl and instance_state.jsonl patterns. Append-only writes avoid corruption from concurrent hooks. Daily rotation keeps files manageable.

Why a skill instead of a standalone setup script? Users are likely already in a Claude Code session when they want to set up. Skills integrate with the existing workflow. The skill can also auto-install missing scripts, solving the chicken-and-egg problem.

Why reframe the token savings metric? The original metric compared against "inject every .md file on every turn," which inflates savings to 99.9%. The practical comparison is: without cognitive you get CLAUDE.md (399 tokens). With cognitive you get CLAUDE.md plus targeted injection (424 tokens for a quiet turn, more when keywords match). This is honest and useful.

Why not wire usage_tracker.py into hooks? The metrics collector already captures keyword effectiveness and file activation data per-turn, which overlaps with usage_tracker's purpose. Wiring it in would require parsing tool call results to infer file access, which is complex for unclear incremental value.

Adds a metrics framework for measuring context routing effectiveness, an interactive setup skill that replaces manual configuration with a guided workflow, and 10 bug fixes in the context router including Windows compatibility and stdin corruption. New components: - Metrics framework (scripts/metrics/) with JSONL event collection, statistical analysis, and report generation - /cognitive-setup skill with project analyzer for automated keyword generation and documentation stub creation - /cognitive-status, /cognitive-state, /cognitive-metrics skills - install.sh for one-command installation Context router fixes: - fcntl import crash on Windows (conditional import with fallback) - stdin double-read in except branch (capture once, reuse) - pinned config parsed but never applied - --diagnostics running after save_state(), mutating state - Session state file race between concurrent sessions - datetime.utcnow() deprecated since Python 3.12 - save_report() path traversal via unsanitized name parameter - argparse --help killing hook process All changes are additive. Existing configurations continue to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics framework, setup automation, and context router hardening#16

Add metrics framework, setup automation, and context router hardening#16
stbiadmin wants to merge 1 commit intoGMaN1911:mainfrom
stbiadmin:feature/setup-automation-and-metrics

stbiadmin commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stbiadmin commented Feb 28, 2026

Summary

Motivation

What changed

Metrics framework (scripts/metrics/, 5 files, ~2,800 lines)

Setup automation (/cognitive-setup skill, install.sh)

Context router hardening (scripts/context-router-v2.py, +261 lines)

Supporting additions

What did NOT change

How to review

Testing

Design decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Metrics framework (`scripts/metrics/`, 5 files, ~2,800 lines)

Setup automation (`/cognitive-setup` skill, `install.sh`)

Context router hardening (`scripts/context-router-v2.py`, +261 lines)