Skip to content

feat(usage): per-conversation cache-hit rate in usage reports#409

Merged
mgoldsborough merged 2 commits into
mainfrom
feat/cache-hit-telemetry
Jun 10, 2026
Merged

feat(usage): per-conversation cache-hit rate in usage reports#409
mgoldsborough merged 2 commits into
mainfrom
feat/cache-hit-telemetry

Conversation

@mgoldsborough

Copy link
Copy Markdown
Contributor

Why

The cache-cost work (#401, #406, #408) is only as good as our ability to see it. This adds the standing signal — cache-hit rate per conversation — so we can confirm the win landed on a tenant and catch any future regression without re-running a one-off forensic pass.

What

computeCacheHitRate(tokens) = cacheRead / (input + cacheRead + cacheWrite) — the fraction of input-side tokens served from cache (cheap reads) rather than re-written or sent uncached.

  • High on a healthy long conversation (the growing prefix is read back each turn).
  • Low on a thrashing one (the prefix is re-written every turn) — exactly the pathology the audit found at 14–40%.

It's derived on the usage report's totals, per model, and per breakdown row (including the groupBy: "conversation" / per-day breakdown), so the unhealthy conversations surface. Because the usage tool returns the whole report as structuredContent, it flows through the tool, the usage bundle, and the UI automatically — no per-consumer change.

The field is optional on the interfaces and set once at report finalization, so construction sites don't change.

Tests

  • cacheHitRate = 0.7 on a 700-read / 200-write / 100-non-cached call, propagated to totals, models, and breakdown.
  • computeCacheHitRate = 0 with no input tokens.
  • Updated the wire-format key-set contract test for the new key.

verify:static green; full unit suite (3,393) green.

Follow-on

The tenant-summary.py operator script (deployments repo) gets the same metric so /tenant-summary shows it — that's the surface for confirming the win on the hq tenant. Handling alongside.

Make cache health observable so the prompt-cache win is confirmable and future
regressions are caught without a one-off forensic pass.

`computeCacheHitRate` = cacheRead / (input + cacheRead + cacheWrite) — the
fraction of input tokens served from cache (cheap reads) rather than re-written
or sent uncached. A healthy long conversation trends high (the growing prefix is
read back each turn); a thrashing one trends low.

It's derived on the report's totals, each model, and each breakdown row
(including the per-conversation breakdown) and flows automatically through the
`usage` tool, the usage bundle, and the UI (the tool returns the whole report).
Field is optional on the interfaces (set once at finalization) so construction
sites don't change.

Adds tests for the metric (0.7 on a 700-read/200-write/100-noncached call,
propagated to totals/models/breakdown) and the no-input edge; updates the
wire-format contract test for the new key.
@mgoldsborough mgoldsborough force-pushed the feat/cache-hit-telemetry branch from dbec26b to d6f4074 Compare June 10, 2026 17:41
Address QA on #409: the aggregator gained cacheHitRate but the lockstep type
mirror in schemas/usage.ts (UsageModelEntry, UsageBreakdownEntry, and the inline
totals of UsageReportOutput) did not — so the field reached structuredContent at
runtime (the spread in usage.ts bypasses excess-property checks) but the typed
shape every UI consumer imports had no such field. The metric was on the wire
but invisible to the UI, contradicting the PR's "flows to the UI" claim.

- Add cacheHitRate?: number to the three mirror types; regenerate the web
  platform-schemas (codegen).
- Render it: a "Cache hit" row in the Tokens card of UsageTotalsCards
  (web/src/pages/settings/usage-shared.tsx), shown in both the self and org
  usage views.

Follow-ups (out of scope here): the usage bundle UI keeps a fourth hand-rolled
copy of the shape (src/bundles/usage/ui/src/App.tsx) that needs the field if it
should show the metric; and a drift guard asserting the mirror's key-set matches
the aggregator would convert this silent failure into a compile/test failure.
@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label Jun 10, 2026
@mgoldsborough mgoldsborough merged commit 4299659 into main Jun 10, 2026
5 checks passed
@mgoldsborough mgoldsborough deleted the feat/cache-hit-telemetry branch June 10, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant