fix(codex): normalize output schemas for OpenAI Structured Outputs compliance by blevinson · Pull Request #1727 · coleam00/Archon

blevinson · 2026-05-19T19:17:08Z

Summary

When using provider: codex in workflow definitions, nodes with output_format fail with invalid_json_schema errors because OpenAI's Structured Outputs API has stricter schema requirements than Claude:

Every type: "object" node must have additionalProperties: false
Every type: "object" node must have a required array listing all property keys

Workflow authors writing provider-agnostic YAML typically omit both since Claude doesn't require them — breaking Codex workflows.

Changes

Added normalizeSchemaForOpenAI() in the Codex provider (packages/providers/src/codex/provider.ts) that recursively walks the output schema tree and fills in the missing constraints before passing to the SDK
Applied it in buildTurnOptions() for both outputFormat.schema and nodeConfig.output_format paths
Updated the existing test to expect the normalized schema

Test plan

All 56 existing provider tests pass
Type-check passes (tsc --noEmit)
Lint + prettier pass (verified via pre-commit hooks)
Tested end-to-end with the remotion-idea-to-video workflow using provider: codex / model: gpt-5.3-codex — both fetch-source and qa-review output_format schemas accepted by OpenAI API (previously both failed with 400)

Error before fix

Invalid schema for response_format 'codex_output_schema':
In context=(), 'additionalProperties' is required to be supplied and to be false.

Invalid schema for response_format 'codex_output_schema':
In context=('properties', 'modes'), 'required' is required to be supplied
and to be an array including every key in properties. Missing 'voiced'.

Made with Cursor

Summary by CodeRabbit

Bug Fixes
- Improved JSON schema normalization and enforcement for Codex structured outputs, making schema-based responses more reliable and preventing extra properties.
- More robust parsing of streamed/concatenated output with a fallback extractor, reducing false "non-JSON" warnings.
Tests
- Updated tests to verify schema normalization and the improved parsing behavior.

…mpliance OpenAI's Structured Outputs API requires `additionalProperties: false` on every object node and `required` to list ALL property keys. Claude doesn't enforce either, so workflow authors writing provider-agnostic YAML typically omit both — causing 400 errors when switching to the Codex provider. Add `normalizeSchemaForOpenAI()` in the Codex provider that recursively walks the schema tree and fills in the missing constraints before passing to the SDK. This makes all existing workflows work with Codex without any YAML changes. Fixes the `invalid_json_schema` error on `output_format` nodes when using `provider: codex` in workflow definitions. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-05-19T19:17:23Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b861c3fc-e523-4a7c-a84a-fde48f99a62e

📥 Commits

Reviewing files that changed from the base of the PR and between 615d37a and 01f7781.

📒 Files selected for processing (1)

packages/providers/src/codex/provider.ts

📝 Walkthrough

Walkthrough

Adds a recursive JSON-schema normalizer and a fallback extractor for streamed/concatenated JSON, wires normalization into Codex turn options, and updates structured-output parsing to use the extractor before emitting a non-JSON warning; tests adjusted to expect the normalized schema.

Changes

Schema normalization and parsing improvements

Layer / File(s)	Summary
Schema normalization utility and integration `packages/providers/src/codex/provider.ts`	Adds `normalizeSchemaForOpenAI` and uses it in `buildTurnOptions` so `turnOptions.outputSchema` is the normalized `outputFormat.schema` or `nodeConfig.output_format`.
Streamed-text JSON extraction helper `packages/providers/src/codex/provider.ts`	Adds `extractLastJsonObject` that scans text for balanced top-level `{ ... }` regions and returns the last parseable JSON object.
Resilient structured-output parsing `packages/providers/src/codex/provider.ts`	On `turn.completed`, try `JSON.parse`, fall back to `extractLastJsonObject` on failure, and only emit the non-JSON system warning if extraction fails.
Test assertion for normalized schema `packages/providers/src/codex/provider.test.ts`	Updates the `sendQuery` test to expect the mock Codex SDK to receive the normalized `outputSchema` (`additionalProperties: false`, `required: ['summary']`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

coleam00/Archon#1162: Both PRs touch Codex buildTurnOptions / structured-output parsing and overlap on how outputSchema is derived and handled.

Poem

🐰 I hopped through schemas, neat and spry,
I closed loose props so none could fly.
When streams spit noise, I peeked the end,
Pulled out the JSON, made parseers mend —
A tidy hop, a careful try.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers the problem, solution, and test validation, but lacks required sections from the template including UX journey, architecture diagrams, label snapshot, validation commands, security impact, compatibility assessment, human verification details, side effects analysis, and rollback plan.	Add missing template sections: UX Journey (before/after flows), Architecture Diagram, Label Snapshot, validation command evidence, Security Impact assessment, Compatibility/Migration details, Human Verification outcomes, Side Effects/Blast Radius analysis, and Rollback Plan.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(codex): normalize output schemas for OpenAI Structured Outputs compliance' accurately describes the main change: adding schema normalization for OpenAI compatibility in the Codex provider.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

packages/providers/src/codex/provider.ts (2)

281-312: 💤 Low value

Consider handling JSON Schema composition keywords in future iteration.

The normalization covers properties, items, but OpenAI Structured Outputs also requires the constraints on schemas nested within oneOf, anyOf, allOf, and $defs/definitions. If workflow authors use these JSON Schema features, their nested object schemas won't be normalized.

Given the PR was end-to-end tested with real workflows, this is likely not urgent, but worth noting for future edge cases.

♻️ Sketch for handling composition keywords

 function normalizeSchemaForOpenAI(schema: Record<string, unknown>): Record<string, unknown> {
   const out = { ...schema };

   if (out.type === 'object') {
     // ... existing object handling
   }

   if (out.type === 'array' && typeof out.items === 'object' && out.items !== null) {
     out.items = normalizeSchemaForOpenAI(out.items as Record<string, unknown>);
   }

+  // Handle composition keywords
+  for (const keyword of ['oneOf', 'anyOf', 'allOf'] as const) {
+    if (Array.isArray(out[keyword])) {
+      out[keyword] = (out[keyword] as Record<string, unknown>[]).map(normalizeSchemaForOpenAI);
+    }
+  }
+
+  // Handle definitions
+  for (const keyword of ['$defs', 'definitions'] as const) {
+    if (typeof out[keyword] === 'object' && out[keyword] !== null) {
+      const defs = out[keyword] as Record<string, Record<string, unknown>>;
+      const normalizedDefs: Record<string, Record<string, unknown>> = {};
+      for (const [name, def] of Object.entries(defs)) {
+        normalizedDefs[name] = normalizeSchemaForOpenAI(def);
+      }
+      out[keyword] = normalizedDefs;
+    }
+  }

   return out;
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/providers/src/codex/provider.ts` around lines 281 - 312, The
normalizeSchemaForOpenAI function currently normalizes properties and items but
skips JSON Schema composition and definition containers; update
normalizeSchemaForOpenAI to also recursively traverse and normalize schemas
found under "oneOf", "anyOf", "allOf" (each is an array of schemas), and
definition containers like "$defs" and "definitions" (which are objects mapping
names to schemas), ensuring you call normalizeSchemaForOpenAI on each nested
schema entry and properly preserve array/object shapes; reference the existing
function normalizeSchemaForOpenAI and the keys "properties" and "items" as
examples of where recursion is applied and add analogous handling for those
composition/definitions keys.

284-296: 💤 Low value

Edge case: objects without properties won't get required array.

If properties is undefined or empty, the required array won't be set. OpenAI's strict mode might require required: [] for objects with no properties.

🛡️ Defensive fix for empty properties

   if (out.type === 'object') {
     if (!('additionalProperties' in out)) {
       out.additionalProperties = false;
     }
+    // Ensure required exists for all objects (OpenAI strict mode)
+    if (!Array.isArray(out.required)) {
+      out.required = [];
+    }
     if (typeof out.properties === 'object' && out.properties !== null) {
       const props = out.properties as Record<string, Record<string, unknown>>;
       const propKeys = Object.keys(props);

-      const existingRequired = Array.isArray(out.required) ? (out.required as string[]) : [];
+      const existingRequired = out.required as string[];
       const missingRequired = propKeys.filter(k => !existingRequired.includes(k));
       if (missingRequired.length > 0) {
         out.required = [...existingRequired, ...missingRequired];
       }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/providers/src/codex/provider.ts` around lines 284 - 296, For object
schemas (when out.type === 'object') ensure we handle cases with missing or
empty properties: still set out.additionalProperties = false if not present, and
if out.properties is not an object, is null, or has no keys then set
out.required to an empty array (or leave it if already an array). To implement,
update the block around out.properties/out.required so that if typeof
out.properties !== 'object' || out.properties === null ||
Object.keys(out.properties as Record<string, unknown> || {}).length === 0 you
assign out.required = Array.isArray(out.required) ? out.required : [] (and
optionally normalize out.properties = {}), otherwise preserve the existing logic
that computes missingRequired from Object.keys(props).

packages/providers/src/codex/provider.test.ts (1)

663-693: 💤 Low value

Test validates basic normalization; consider adding nested schema coverage.

The test correctly validates that additionalProperties: false is added and required is preserved. However, it doesn't exercise the recursive normalization for nested objects or array items. Consider adding a test with nested schemas.

💚 Example additional test for nested schemas

test('normalizes nested object schemas in properties and array items', async () => {
  mockRunStreamed.mockResolvedValue({
    events: (async function* () {
      yield { type: 'turn.completed', usage: defaultUsage };
    })(),
  });

  const schema = {
    type: 'object',
    properties: {
      nested: {
        type: 'object',
        properties: { value: { type: 'string' } },
        // Missing additionalProperties and required
      },
      items: {
        type: 'array',
        items: {
          type: 'object',
          properties: { id: { type: 'number' } },
        },
      },
    },
  };

  for await (const _ of client.sendQuery('test', '/workspace', undefined, {
    outputFormat: { type: 'json_schema', schema },
  })) {
    // consume
  }

  expect(mockRunStreamed).toHaveBeenCalledWith(
    'test',
    expect.objectContaining({
      outputSchema: expect.objectContaining({
        properties: expect.objectContaining({
          nested: expect.objectContaining({
            additionalProperties: false,
            required: ['value'],
          }),
          items: expect.objectContaining({
            items: expect.objectContaining({
              additionalProperties: false,
              required: ['id'],
            }),
          }),
        }),
      }),
    })
  );
});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/providers/src/codex/provider.test.ts` around lines 663 - 693, Add a
new unit test that exercises recursive normalization of nested JSON schemas:
create a test (e.g. 'normalizes nested object schemas in properties and array
items') that sets mockRunStreamed to return a completed turn, defines a schema
with nested object in properties (missing additionalProperties and required) and
an array whose items are object schemas, call client.sendQuery with
outputFormat: { type: 'json_schema', schema }, consume the async iterator, and
assert mockRunStreamed was called with TurnOptions containing outputSchema where
nested property schemas and array items have additionalProperties: false and
required populated (use expect.objectContaining to check nested.required and
nested.additionalProperties on the nested property names).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/providers/src/codex/provider.test.ts`:
- Around line 663-693: Add a new unit test that exercises recursive
normalization of nested JSON schemas: create a test (e.g. 'normalizes nested
object schemas in properties and array items') that sets mockRunStreamed to
return a completed turn, defines a schema with nested object in properties
(missing additionalProperties and required) and an array whose items are object
schemas, call client.sendQuery with outputFormat: { type: 'json_schema', schema
}, consume the async iterator, and assert mockRunStreamed was called with
TurnOptions containing outputSchema where nested property schemas and array
items have additionalProperties: false and required populated (use
expect.objectContaining to check nested.required and nested.additionalProperties
on the nested property names).

In `@packages/providers/src/codex/provider.ts`:
- Around line 281-312: The normalizeSchemaForOpenAI function currently
normalizes properties and items but skips JSON Schema composition and definition
containers; update normalizeSchemaForOpenAI to also recursively traverse and
normalize schemas found under "oneOf", "anyOf", "allOf" (each is an array of
schemas), and definition containers like "$defs" and "definitions" (which are
objects mapping names to schemas), ensuring you call normalizeSchemaForOpenAI on
each nested schema entry and properly preserve array/object shapes; reference
the existing function normalizeSchemaForOpenAI and the keys "properties" and
"items" as examples of where recursion is applied and add analogous handling for
those composition/definitions keys.
- Around line 284-296: For object schemas (when out.type === 'object') ensure we
handle cases with missing or empty properties: still set
out.additionalProperties = false if not present, and if out.properties is not an
object, is null, or has no keys then set out.required to an empty array (or
leave it if already an array). To implement, update the block around
out.properties/out.required so that if typeof out.properties !== 'object' ||
out.properties === null || Object.keys(out.properties as Record<string, unknown>
|| {}).length === 0 you assign out.required = Array.isArray(out.required) ?
out.required : [] (and optionally normalize out.properties = {}), otherwise
preserve the existing logic that computes missingRequired from
Object.keys(props).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67fd02d5-213a-40fa-9c54-5bbf95cad06a

📥 Commits

Reviewing files that changed from the base of the PR and between 0adec22 and 615d37a.

📒 Files selected for processing (2)

packages/providers/src/codex/provider.test.ts
packages/providers/src/codex/provider.ts

Codex streams multiple intermediate JSON objects as progress updates during a turn. The accumulated text is a concatenation of all of them, which isn't valid JSON. When JSON.parse fails on the full text, we now extract the last complete top-level JSON object using brace-depth tracking — that's the authoritative final answer. Without this, structuredOutput was undefined for multi-message turns, causing downstream condition evaluators ($node.output.field) to fail with condition_json_parse_failed and skip conditional nodes. Co-authored-by: Cursor <cursoragent@cursor.com>

Wirasm · 2026-05-20T09:58:06Z

Review Summary

Verdict: minor-fixes-needed

Your PR adds a clean schema normalizer and JSON extractor for OpenAI Structured Outputs compliance. The approach is sound and error handling is solid — no blocking issues. The main gap is test coverage: two new pure functions have zero unit tests.

Blocking issues

(None — this is ready for a follow-up PR if you prefer, but the tests are the most impactful additions.)

Suggested fixes

provider.ts:271 (normalizeSchemaForOpenAI): Add unit tests covering all branches: missing additionalProperties → false, no required → computed from keys, partial required → missing appended, nested objects → recursively normalized.
provider.ts:340 (extractLastJsonObject): Add unit tests covering: single JSON, two concatenated JSONs (returns last), no braces, unmatched braces, strings containing {/}, escaped \", invalid then valid → last valid returned.
provider.ts:655-665 (streamCodexEvents fallback path): Add integration test for the recovery path — when JSON.parse fails but extractLastJsonObject succeeds. Mock a stream with concatenated JSON and assert structuredOutput is populated correctly with no system warning.

Minor / nice-to-have

provider.ts:271-277: Trim the JSDoc to one line. The "workflow YAML typically omits both" note is worth keeping; the two-item list is redundant with the function name.
provider.ts:269: Use structuredClone(schema) instead of shallow copy { ...schema } for defensive mutation hygiene.

Compliments

The extractLastJsonObject JSDoc is excellent — the forward-scan + brace-depth approach and the "return last object for authoritative final answer" rationale are exactly the hidden invariants worth documenting. The error-recovery chain (direct parse → extract last → system warning) is explicit and preserves UX correctly.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality.

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex): normalize output schemas for OpenAI Structured Outputs compliance#1727

fix(codex): normalize output schemas for OpenAI Structured Outputs compliance#1727
blevinson wants to merge 2 commits into
coleam00:devfrom
blevinson:fix/codex-structured-output-schema

blevinson commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Wirasm commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blevinson commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Error before fix

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Wirasm commented May 20, 2026

Review Summary

Blocking issues

Suggested fixes

Minor / nice-to-have

Compliments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blevinson commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading