api-proxy efficiency: prompt-cache injection, TTL upgrading, tool dropping, ANSI stripping (inspired by alxsuv/pino)

Background
alxsuv/pino is a ~500-line zero-dependency local reverse proxy that achieves ~90% savings on Claude Code API costs by intelligently manipulating Anthropic /v1/messages requests before forwarding upstream. AWF's api-proxy already sits in exactly the same position in the request path, and several of these techniques map directly onto its existing bodyTransform / makeModelBodyTransform infrastructure.

This issue tracks four concrete optimizations and one extensibility hook, all opt-in via environment variables.

1. Auto-inject prompt-cache breakpoints (AWF_ANTHROPIC_AUTO_CACHE=1)
Problem. Claude Code ships zero cache_control breakpoints on its tools array (~24 k tokens). The entire tool catalog is re-billed at full input price on every API round-trip. Additionally, the system prompt's cache_control: {type: "ephemeral"} has no ttl field, silently defaulting to 5 minutes — a single thoughtful turn (long generation, slow tool call, user reading output) blows the window and forces a 1.25× re-write on the next turn.

Solution. When processing Anthropic /v1/messages POST requests, inject cache_control: {type: "ephemeral", ttl: "1h"} on up to four positions (Anthropic's breakpoint ceiling):

Slot	Target	TTL	Tokens saved/turn
1	Last tools entry	1h	~24 k
2	Last system block	1h	~8 k
3	Last block of messages[0] (static reminders)	1h	~5 k
4	Rolling tail (last block across all messages)	5m (configurable)	~15 k
Also adds anthropic-beta: extended-cache-ttl-2025-04-11 to the upstream request.

Savings. Per-call input cost drops ~90% (Sonnet: ~$0.158 → ~$0.016). With 10–100+ round-trips per user message, a 30-trip agentic task saves roughly $4.15 (Sonnet) or $20.76 (Opus). Breakeven at turn 2; every turn after is pure cache-read at 0.1×.

2. Upgrade ephemeral TTL 5 m → 1 h (AWF_ANTHROPIC_CACHE_TAIL_TTL)
Problem. Even when Claude Code does place a cache_control breakpoint, the missing ttl field means a silent 5-minute default. One slow turn and the cached prefix expires, forcing a re-write surcharge on the next call.

Solution. Rewrite every existing {type: "ephemeral"} breakpoint to {type: "ephemeral", ttl: "1h"} — except the rolling tail, which stays at 5m by default to avoid paying the 2× write multiplier on a breakpoint that moves every turn. Configurable via AWF_ANTHROPIC_CACHE_TAIL_TTL=1h to extend even the tail for sessions that stay inside a 1h window.

3. Drop unused tools (AWF_ANTHROPIC_DROP_TOOLS)
Problem. Claude Code's tool catalog includes tools that are never needed in many workflows (NotebookEdit, CronCreate, CronDelete, CronList, RemoteTrigger, PushNotification, Monitor). These consume ~3,300 tokens on every single round-trip, inflating both cost and context window usage.

Solution. Accept a comma-separated AWF_ANTHROPIC_DROP_TOOLS environment variable. Before forwarding, strip named tools from the tools array and scrub their names from system-prompt reminders. Independent of caching — with caching in place it also shrinks each cache-write slot.

4. Strip ANSI escape codes (AWF_ANTHROPIC_STRIP_ANSI=1)
Problem. Terminal output in tool_result blocks (bash output, grep output, etc.) contains SGR escape sequences for colours and formatting. These sequences are not semantically meaningful to the LLM but: (a) inflate token count, and (b) prevent tool_result blocks from caching cleanly — even minor colour-code differences cause a cache miss.

Solution. Strip ANSI SGR sequences from tool_result text content before forwarding. Roughly halves tool_result token count in colour-heavy outputs and enables clean cache hits across turns.

5. Custom body-transform hook (AWF_ANTHROPIC_TRANSFORM_FILE)
Problem. Different teams need different request mutations — custom system-prompt edits, additional tool filtering, request augmentation with user context. Static env vars can't cover every use case.

Solution. Allow specifying a path to a JS file that exports transform(body). The api-proxy calls this before forwarding, using the return value as the outgoing body. The existing bodyTransform parameter in proxyRequest() is the natural extension point for this.

Implementation Notes
All optimizations are opt-in and isolated to POST /v1/messages on Anthropic port 10001.
pino is zero-dependency and ~500 lines; the same techniques can be implemented natively in AWF without adding dependencies.
The existing makeModelBodyTransform() / bodyTransform chain in containers/api-proxy/server.js is the right place to add these transforms — they compose cleanly.
Tests should cover: breakpoint injection count (≤4), TTL rewriting, rolling-tail detection, tool-drop count, ANSI strip correctness, and idempotency (running twice produces the same output as running once).
References
Source of techniques: alxsuv/pino
Anthropic prompt caching docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
AWF api-proxy entry point: containers/api-proxy/server.js
Existing body-transform plumbing: makeModelBodyTransform() (line ~372), proxyRequest() bodyTransform param (line ~481)
Summary of what pino does that AWF's api-proxy does not yet do:

Feature	pino	AWF api-proxy
Model alias rewriting	❌	✅
Rate limiting	❌	✅
Metrics / token tracking	❌	✅
Auto prompt-cache breakpoints	✅	❌
TTL upgrade 5m → 1h	✅	❌
Drop unused tools	✅	❌
Strip ANSI from tool_results	✅	❌
Custom transform hook	✅	❌

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api-proxy efficiency: prompt-cache injection, TTL upgrading, tool dropping, ANSI stripping (inspired by alxsuv/pino) #2359

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

api-proxy efficiency: prompt-cache injection, TTL upgrading, tool dropping, ANSI stripping (inspired by alxsuv/pino) #2359

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions