This capability implements OpenAI-compatible behavior for POST /v1/responses, including request validation, streaming events, non-streaming aggregation, and OpenAI-style error envelopes. The scope is limited to what the ChatGPT upstream can provide; unsupported features are explicitly rejected.
See openspec/specs/responses-api-compat/spec.md for normative requirements.
- Responses as canonical wire format: Internally we treat Responses as the source of truth to avoid divergent streaming semantics.
- Strict validation: Required fields and mutually exclusive fields are enforced up front to match official client expectations.
- Cursor alias compatibility: Cursor UI model labels may append reasoning or speed suffixes to GPT-5 slugs; those are normalized to canonical upstream fields before forwarding.
- No truncation support: Requests that include
truncationare rejected because upstream does not support it. - Compact as a separate contract: Standalone compact is treated as a canonical opaque context-window contract, not as a variant of buffered normal
/responses.
- Upstream limitations determine available modalities, tool output, and overflow handling.
store=trueis rejected; responses are not persisted.includevalues must be on the documented allowlist.truncationis rejected.previous_response_idis forwarded whenconversationis absent, but theconversation + previous_response_idconflict remains rejected.- HTTP
/v1/responsesand HTTP/backend-api/codex/responsesnow use a server-side upstream websocket session bridge by default so repeated compatible requests can keep upstream response/session continuity without forcing clients onto the public websocket route. - Codex-affinity HTTP bridge sessions can optionally use a conservative first-request prewarm (
generate=false), but that behavior now stays behind an explicit flag so production defaults do not pay an extra upstream request unless operators opt in. - When operators configure a multi-instance bridge ring, deterministic owner enforcement now applies only to hard continuity keys such as
x-codex-turn-stateand explicit session headers. Prompt-cache-derived bridge keys remain stable for local reuse, but in gateway-safe mode a non-owner replica may tolerate that locality miss and create or reuse a local session instead of failing withbridge_instance_mismatch. - Codex-facing websocket routes now advertise
x-codex-turn-stateduring websocket accept and honor client-provided turn-state on reconnect so routing can stay sticky at turn granularity even when the public websocket reconnects. - HTTP responses routes now also return
x-codex-turn-stateheaders so clients that persist response headers can promote later HTTP requests from prompt-cache affinity to stronger Codex-session continuity. /v1/responses/compactkeeps a final-JSON contract and preserves the raw upstream/codex/responses/compactpayload shape as the canonical next context window instead of rewriting it through buffered/codex/responsesstreaming.- Compact transport failures fail closed with respect to semantics: no surrogate
/codex/responsesfallback and no local compact-window reconstruction. - Compact transport may use bounded same-contract retries only for safe pre-body transport failures and
401 -> refresh -> retry. /v1/responses/compactis supported only when the upstream implements it.prompt_cache_keyaffinity on OpenAI-style routes is intentionally bounded by a dashboard-managed freshness window, unlike durable backendsession_idor dashboard sticky-thread routing.- Codex-native direct websocket
/backend-api/codex/responsestreats upstreamprevious_response_idas an ephemeral anchor. If that anchor goes stale, the proxy must mask rawprevious_response_not_founddetails and emit a sanitizedcodex_previous_response_staleclassifier so compatible Codex clients can soft-reset and retry withoutprevious_response_id.
codex-lb accepts the OpenAI/Codex service_tier field on Responses and Chat
Completions compatible routes. The legacy fast spelling is accepted as an
alias and is forwarded upstream as the canonical priority tier.
Fast Mode is request-level intent, not a local speed guarantee. The upstream Codex backend decides the actual tier for each completed response. codex-lb therefore records three separate values in request logs:
requestedServiceTier: what the client or API key asked for, after alias normalization.actualServiceTier: what upstream reported in the completed response, when upstream included it.serviceTier: the effective billable tier. This usesactualServiceTierwhen present and falls back torequestedServiceTieronly when upstream omits the actual tier.
If a request is sent with service_tier: "fast" or service_tier: "priority"
and the completed row shows requestedServiceTier: "priority" but
actualServiceTier: "default", codex-lb forwarded the priority request and
upstream chose the default tier. That can happen even when websocket transport
is active.
For OpenCode or Codex-compatible clients, enable Fast Mode by sending a Responses request with:
{
"service_tier": "priority"
}Clients that expose Fast Mode as fast may keep using that spelling; codex-lb
normalizes it to priority before forwarding.
API keys can also force the tier for traffic that uses that key. Set the key's
enforced service tier to priority or fast; both values are stored and
returned as priority.
To verify a completed Fast Mode request:
Transportshould beWSif you are verifying the websocket Codex path.requestedServiceTiershould beprioritywhen the client requested Fast Mode or the API key enforced it.actualServiceTieris the upstream result.defaultmeans upstream did not grant priority for that response.
This distinction matters for quota and cost accounting: codex-lb prices the
request from the effective billable serviceTier, not from the requested tier
when upstream reports a different actual tier.
code_interpreter_call.outputscomputer_call_output.output.image_urlfile_search_call.resultsmessage.input_image.image_urlmessage.output_text.logprobsreasoning.encrypted_contentweb_search_call.action.sources
- Stream ends without terminal event: Emit
response.failedwithstream_incomplete. - Upstream error / no accounts: Non-streaming responses return an OpenAI error envelope with 5xx status.
- Compact upstream transport/client failure: Retry only inside
/codex/responses/compactwhen the failure is safely retryable; otherwise return an explicit upstream error without surrogate fallback. - HTTP bridge session closes or expires: The next compatible HTTP
/v1/responsesor/backend-api/codex/responsesrequest recreates a fresh upstream websocket bridge session; continuity is guaranteed only within the lifetime of one active bridged session. - Multi-instance routing without bridge owner policy: if operators do not configure a bridge ring or front-door affinity, continuity can still fragment across replicas. With a configured bridge ring, hard continuity keys still fail closed on the wrong replica, while gateway-safe prompt-cache requests may accept locality misses instead of failing.
- Codex websocket reconnects: Reconnect continuity now depends on the client replaying the accepted
x-codex-turn-state; generated turn-state is emitted on accept for backend Codex routes and echoed back when the client already supplies one. - Codex websocket stale previous-response anchors: Direct backend Codex websocket stale-anchor failures are surfaced as
response.failed/codex_previous_response_stalewithout the raw upstream code or missingresp_...id; OpenAI-compatible/v1/responseswebsocket clients continue to receive genericstream_incompletemasking. - Websocket handshake forbidden/not-found: Auto transport now fails loud on
403/404instead of silently hiding the websocket regression behind HTTP fallback. - Invalid request payloads: Return 4xx with
invalid_request_error.
- 401 →
invalid_api_key - 403 →
insufficient_permissions - 404 →
not_found - 429 →
rate_limit_exceeded - 5xx →
server_error
Non-streaming request/response:
// request
{ "model": "gpt-5.1", "input": "hi" }// response
{ "id": "resp_123", "object": "response", "status": "completed", "output": [] }Cursor-style model alias request:
{ "model": "gpt-5.4-mini-high", "input": "hi" }This forwards upstream as model: "gpt-5.4-mini" with reasoning.effort: "high".
- Pre-release: run unit/integration tests and optional OpenAI client compatibility tests.
- Smoke tests: stream a response, validate non-stream responses, and verify error envelopes.
- Post-deploy: monitor
no_accounts,upstream_unavailable, compact retry attempts, and compact failure phases, especially on direct compact requests. - Post-deploy: monitor HTTP bridge reuse/create/evict/reconnect counts and any
previous_response_not_foundor queue-saturation errors on/v1/responsesand/backend-api/codex/responses. - Post-deploy: monitor
capacity_exhausted_active_sessions, Codex-session bridge reuse/evict counts, websocket handshake 403/404 rates after the narrower auto-fallback policy, and backend Codex HTTP vs websocket cache-ratio gaps. - When tracing compact incidents, confirm that request logs and upstream logs show direct
/codex/responses/compactusage without surrogate/codex/responsesfallback. - Post-deploy: monitor
no_accounts,stream_incomplete, andupstream_unavailable. - Post-deploy: monitor
codex_previous_response_staleon/backend-api/codex/responses; recurring spikes mean clients are still relying on stale upstream anchors and should perform the documented full-context retry withoutprevious_response_id. - Websocket/Codex CLI tier verification runbook:
openspec/specs/responses-api-compat/ops.md