System prompt forwarded with per-request `x-anthropic-billing-header` line — defeats upstream prompt cache

## Summary

When converting an Anthropic `/v1/messages` request to a Codex Responses payload, the proxy concatenates the inbound `system` array verbatim into `instructions`. Anthropic clients (notably Claude Code) prefix that array with a non-semantic header line whose hash changes on every request, so the upstream prompt prefix is unique per call and Codex's prompt cache never hits.

## Background

Claude Code prefixes its system prompt with a line of the form:

```
x-anthropic-billing-header: cc_version=2.1.117.48f; cc_entrypoint=cli; cch=71fea;
```

The `cch=<hash>` portion regenerates on every request. The line carries no semantic value to the model — it's a billing/telemetry header. When forwarded into the Codex `instructions` field unchanged, every turn presents a brand-new prefix to the backend.

## Root cause

`src/transformers/request.ts`, `extractSystemPrompt`:

```ts
function extractSystemPrompt(system: string | ContentBlock[] | undefined): string {
  if (!system) return "";
  if (typeof system === "string") return system;

  return system
    .map((block) => {
      if (block.type === "text" && block.text) return block.text;
      return "";
    })
    .filter(Boolean)
    .join("\n");
}
```

Every `text` block is forwarded as-is. There's no filter for the `x-anthropic-billing-header:` line (or any other non-semantic Anthropic header line). The result is assigned to `instructions` on the outbound Codex Responses payload.

There's also no `prompt_cache_key` set on the outbound payload, so even a stable downstream cache routing key isn't available as a fallback.

## Impact

- **Cost.** Long multi-turn Claude Code sessions re-bill the full input context (often a large `CLAUDE.md` + tool catalog + accumulated message history) on every turn. Typical multiplier vs. correct cache reuse: 5–10×.
- **Latency.** Cache-miss prefills are noticeably slower than cache hits, so every turn after the first feels sluggish.
- **Subscription throughput.** Per-account rate limits exhaust faster than they should because effective input-token throughput is lower.

## Suggested fix

Strip non-semantic Anthropic header lines before joining:

```ts
function stripNonSemanticSystemLines(text: string): string {
  return text
    .split("\n")
    .filter((line) => !line.trim().toLowerCase().startsWith("x-anthropic-billing-header:"))
    .join("\n")
    .trim();
}

function extractSystemPrompt(system: string | ContentBlock[] | undefined): string {
  if (!system) return "";
  if (typeof system === "string") return stripNonSemanticSystemLines(system);

  return system
    .map((block) => {
      if (block.type === "text" && block.text) return stripNonSemanticSystemLines(block.text);
      return "";
    })
    .filter(Boolean)
    .join("\n\n");
}
```

Optional second hardening: in the Codex request payload, set `prompt_cache_key` to a stable session-scoped value (e.g. derived from a conversation identifier). The Responses API uses this field for cache routing across requests with otherwise-equivalent prefixes.

## Reproduction

1. Send any Anthropic `/v1/messages` request whose `system` array begins with a `text` block containing `x-anthropic-billing-header: cc_version=...; cch=<hash>;` as its first line. (Real Claude Code traffic does this automatically.)
2. Observe the outbound Codex Responses payload — the `instructions` field starts with the billing-header line.
3. Send a second request with a different `cch=<hash>` to simulate a fresh Claude Code session. The two outbound `instructions` differ in their first line, so prefix-based caching at the upstream cannot hit between them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System prompt forwarded with per-request `x-anthropic-billing-header` line — defeats upstream prompt cache #2

Summary

Background

Root cause

Impact

Suggested fix

Reproduction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

System prompt forwarded with per-request x-anthropic-billing-header line — defeats upstream prompt cache #2

Description

Summary

Background

Root cause

Impact

Suggested fix

Reproduction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

System prompt forwarded with per-request `x-anthropic-billing-header` line — defeats upstream prompt cache #2