feat(usage): per-conversation cache-hit rate in usage reports#409
Merged
Conversation
Make cache health observable so the prompt-cache win is confirmable and future regressions are caught without a one-off forensic pass. `computeCacheHitRate` = cacheRead / (input + cacheRead + cacheWrite) — the fraction of input tokens served from cache (cheap reads) rather than re-written or sent uncached. A healthy long conversation trends high (the growing prefix is read back each turn); a thrashing one trends low. It's derived on the report's totals, each model, and each breakdown row (including the per-conversation breakdown) and flows automatically through the `usage` tool, the usage bundle, and the UI (the tool returns the whole report). Field is optional on the interfaces (set once at finalization) so construction sites don't change. Adds tests for the metric (0.7 on a 700-read/200-write/100-noncached call, propagated to totals/models/breakdown) and the no-input edge; updates the wire-format contract test for the new key.
dbec26b to
d6f4074
Compare
Address QA on #409: the aggregator gained cacheHitRate but the lockstep type mirror in schemas/usage.ts (UsageModelEntry, UsageBreakdownEntry, and the inline totals of UsageReportOutput) did not — so the field reached structuredContent at runtime (the spread in usage.ts bypasses excess-property checks) but the typed shape every UI consumer imports had no such field. The metric was on the wire but invisible to the UI, contradicting the PR's "flows to the UI" claim. - Add cacheHitRate?: number to the three mirror types; regenerate the web platform-schemas (codegen). - Render it: a "Cache hit" row in the Tokens card of UsageTotalsCards (web/src/pages/settings/usage-shared.tsx), shown in both the self and org usage views. Follow-ups (out of scope here): the usage bundle UI keeps a fourth hand-rolled copy of the shape (src/bundles/usage/ui/src/App.tsx) that needs the field if it should show the metric; and a drift guard asserting the mirror's key-set matches the aggregator would convert this silent failure into a compile/test failure.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The cache-cost work (#401, #406, #408) is only as good as our ability to see it. This adds the standing signal — cache-hit rate per conversation — so we can confirm the win landed on a tenant and catch any future regression without re-running a one-off forensic pass.
What
computeCacheHitRate(tokens)=cacheRead / (input + cacheRead + cacheWrite)— the fraction of input-side tokens served from cache (cheap reads) rather than re-written or sent uncached.It's derived on the usage report's totals, per model, and per breakdown row (including the
groupBy: "conversation"/ per-day breakdown), so the unhealthy conversations surface. Because theusagetool returns the whole report asstructuredContent, it flows through the tool, the usage bundle, and the UI automatically — no per-consumer change.The field is optional on the interfaces and set once at report finalization, so construction sites don't change.
Tests
cacheHitRate= 0.7 on a 700-read / 200-write / 100-non-cached call, propagated to totals, models, and breakdown.computeCacheHitRate= 0 with no input tokens.verify:staticgreen; full unit suite (3,393) green.Follow-on
The
tenant-summary.pyoperator script (deployments repo) gets the same metric so/tenant-summaryshows it — that's the surface for confirming the win on the hq tenant. Handling alongside.