feat(usage): per-conversation cache-hit rate in usage reports by mgoldsborough · Pull Request #409 · NimbleBrainInc/nimblebrain

mgoldsborough · 2026-06-10T02:16:40Z

Why

The cache-cost work (#401, #406, #408) is only as good as our ability to see it. This adds the standing signal — cache-hit rate per conversation — so we can confirm the win landed on a tenant and catch any future regression without re-running a one-off forensic pass.

What

computeCacheHitRate(tokens) = cacheRead / (input + cacheRead + cacheWrite) — the fraction of input-side tokens served from cache (cheap reads) rather than re-written or sent uncached.

High on a healthy long conversation (the growing prefix is read back each turn).
Low on a thrashing one (the prefix is re-written every turn) — exactly the pathology the audit found at 14–40%.

It's derived on the usage report's totals, per model, and per breakdown row (including the groupBy: "conversation" / per-day breakdown), so the unhealthy conversations surface. Because the usage tool returns the whole report as structuredContent, it flows through the tool, the usage bundle, and the UI automatically — no per-consumer change.

The field is optional on the interfaces and set once at report finalization, so construction sites don't change.

Tests

cacheHitRate = 0.7 on a 700-read / 200-write / 100-non-cached call, propagated to totals, models, and breakdown.
computeCacheHitRate = 0 with no input tokens.
Updated the wire-format key-set contract test for the new key.

verify:static green; full unit suite (3,393) green.

Follow-on

The tenant-summary.py operator script (deployments repo) gets the same metric so /tenant-summary shows it — that's the surface for confirming the win on the hq tenant. Handling alongside.

Make cache health observable so the prompt-cache win is confirmable and future regressions are caught without a one-off forensic pass. `computeCacheHitRate` = cacheRead / (input + cacheRead + cacheWrite) — the fraction of input tokens served from cache (cheap reads) rather than re-written or sent uncached. A healthy long conversation trends high (the growing prefix is read back each turn); a thrashing one trends low. It's derived on the report's totals, each model, and each breakdown row (including the per-conversation breakdown) and flows automatically through the `usage` tool, the usage bundle, and the UI (the tool returns the whole report). Field is optional on the interfaces (set once at finalization) so construction sites don't change. Adds tests for the metric (0.7 on a 700-read/200-write/100-noncached call, propagated to totals/models/breakdown) and the no-input edge; updates the wire-format contract test for the new key.

Address QA on #409: the aggregator gained cacheHitRate but the lockstep type mirror in schemas/usage.ts (UsageModelEntry, UsageBreakdownEntry, and the inline totals of UsageReportOutput) did not — so the field reached structuredContent at runtime (the spread in usage.ts bypasses excess-property checks) but the typed shape every UI consumer imports had no such field. The metric was on the wire but invisible to the UI, contradicting the PR's "flows to the UI" claim. - Add cacheHitRate?: number to the three mirror types; regenerate the web platform-schemas (codegen). - Render it: a "Cache hit" row in the Tokens card of UsageTotalsCards (web/src/pages/settings/usage-shared.tsx), shown in both the self and org usage views. Follow-ups (out of scope here): the usage bundle UI keeps a fourth hand-rolled copy of the shape (src/bundles/usage/ui/src/App.tsx) that needs the field if it should show the metric; and a drift guard asserting the mirror's key-set matches the aggregator would convert this silent failure into a compile/test failure.

mgoldsborough force-pushed the feat/cache-hit-telemetry branch from dbec26b to d6f4074 Compare June 10, 2026 17:41

mgoldsborough added the qa-reviewed QA review completed with no critical issues label Jun 10, 2026

mgoldsborough merged commit 4299659 into main Jun 10, 2026
5 checks passed

mgoldsborough deleted the feat/cache-hit-telemetry branch June 10, 2026 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(usage): per-conversation cache-hit rate in usage reports#409

feat(usage): per-conversation cache-hit rate in usage reports#409
mgoldsborough merged 2 commits into
mainfrom
feat/cache-hit-telemetry

mgoldsborough commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgoldsborough commented Jun 10, 2026

Why

What

Tests

Follow-on

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant