Summary
When converting an Anthropic /v1/messages request to a Codex Responses payload, the proxy concatenates the inbound system array verbatim into instructions. Anthropic clients (notably Claude Code) prefix that array with a non-semantic header line whose hash changes on every request, so the upstream prompt prefix is unique per call and Codex's prompt cache never hits.
Background
Claude Code prefixes its system prompt with a line of the form:
x-anthropic-billing-header: cc_version=2.1.117.48f; cc_entrypoint=cli; cch=71fea;
The cch=<hash> portion regenerates on every request. The line carries no semantic value to the model — it's a billing/telemetry header. When forwarded into the Codex instructions field unchanged, every turn presents a brand-new prefix to the backend.
Root cause
src/transformers/request.ts, extractSystemPrompt:
function extractSystemPrompt(system: string | ContentBlock[] | undefined): string {
if (!system) return "";
if (typeof system === "string") return system;
return system
.map((block) => {
if (block.type === "text" && block.text) return block.text;
return "";
})
.filter(Boolean)
.join("\n");
}
Every text block is forwarded as-is. There's no filter for the x-anthropic-billing-header: line (or any other non-semantic Anthropic header line). The result is assigned to instructions on the outbound Codex Responses payload.
There's also no prompt_cache_key set on the outbound payload, so even a stable downstream cache routing key isn't available as a fallback.
Impact
- Cost. Long multi-turn Claude Code sessions re-bill the full input context (often a large
CLAUDE.md + tool catalog + accumulated message history) on every turn. Typical multiplier vs. correct cache reuse: 5–10×.
- Latency. Cache-miss prefills are noticeably slower than cache hits, so every turn after the first feels sluggish.
- Subscription throughput. Per-account rate limits exhaust faster than they should because effective input-token throughput is lower.
Suggested fix
Strip non-semantic Anthropic header lines before joining:
function stripNonSemanticSystemLines(text: string): string {
return text
.split("\n")
.filter((line) => !line.trim().toLowerCase().startsWith("x-anthropic-billing-header:"))
.join("\n")
.trim();
}
function extractSystemPrompt(system: string | ContentBlock[] | undefined): string {
if (!system) return "";
if (typeof system === "string") return stripNonSemanticSystemLines(system);
return system
.map((block) => {
if (block.type === "text" && block.text) return stripNonSemanticSystemLines(block.text);
return "";
})
.filter(Boolean)
.join("\n\n");
}
Optional second hardening: in the Codex request payload, set prompt_cache_key to a stable session-scoped value (e.g. derived from a conversation identifier). The Responses API uses this field for cache routing across requests with otherwise-equivalent prefixes.
Reproduction
- Send any Anthropic
/v1/messages request whose system array begins with a text block containing x-anthropic-billing-header: cc_version=...; cch=<hash>; as its first line. (Real Claude Code traffic does this automatically.)
- Observe the outbound Codex Responses payload — the
instructions field starts with the billing-header line.
- Send a second request with a different
cch=<hash> to simulate a fresh Claude Code session. The two outbound instructions differ in their first line, so prefix-based caching at the upstream cannot hit between them.
Summary
When converting an Anthropic
/v1/messagesrequest to a Codex Responses payload, the proxy concatenates the inboundsystemarray verbatim intoinstructions. Anthropic clients (notably Claude Code) prefix that array with a non-semantic header line whose hash changes on every request, so the upstream prompt prefix is unique per call and Codex's prompt cache never hits.Background
Claude Code prefixes its system prompt with a line of the form:
The
cch=<hash>portion regenerates on every request. The line carries no semantic value to the model — it's a billing/telemetry header. When forwarded into the Codexinstructionsfield unchanged, every turn presents a brand-new prefix to the backend.Root cause
src/transformers/request.ts,extractSystemPrompt:Every
textblock is forwarded as-is. There's no filter for thex-anthropic-billing-header:line (or any other non-semantic Anthropic header line). The result is assigned toinstructionson the outbound Codex Responses payload.There's also no
prompt_cache_keyset on the outbound payload, so even a stable downstream cache routing key isn't available as a fallback.Impact
CLAUDE.md+ tool catalog + accumulated message history) on every turn. Typical multiplier vs. correct cache reuse: 5–10×.Suggested fix
Strip non-semantic Anthropic header lines before joining:
Optional second hardening: in the Codex request payload, set
prompt_cache_keyto a stable session-scoped value (e.g. derived from a conversation identifier). The Responses API uses this field for cache routing across requests with otherwise-equivalent prefixes.Reproduction
/v1/messagesrequest whosesystemarray begins with atextblock containingx-anthropic-billing-header: cc_version=...; cch=<hash>;as its first line. (Real Claude Code traffic does this automatically.)instructionsfield starts with the billing-header line.cch=<hash>to simulate a fresh Claude Code session. The two outboundinstructionsdiffer in their first line, so prefix-based caching at the upstream cannot hit between them.