fix(codex): normalize output schemas for OpenAI Structured Outputs compliance#1727
fix(codex): normalize output schemas for OpenAI Structured Outputs compliance#1727blevinson wants to merge 2 commits into
Conversation
…mpliance OpenAI's Structured Outputs API requires `additionalProperties: false` on every object node and `required` to list ALL property keys. Claude doesn't enforce either, so workflow authors writing provider-agnostic YAML typically omit both — causing 400 errors when switching to the Codex provider. Add `normalizeSchemaForOpenAI()` in the Codex provider that recursively walks the schema tree and fills in the missing constraints before passing to the SDK. This makes all existing workflows work with Codex without any YAML changes. Fixes the `invalid_json_schema` error on `output_format` nodes when using `provider: codex` in workflow definitions. Co-authored-by: Cursor <cursoragent@cursor.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a recursive JSON-schema normalizer and a fallback extractor for streamed/concatenated JSON, wires normalization into Codex turn options, and updates structured-output parsing to use the extractor before emitting a non-JSON warning; tests adjusted to expect the normalized schema. ChangesSchema normalization and parsing improvements
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
packages/providers/src/codex/provider.ts (2)
281-312: 💤 Low valueConsider handling JSON Schema composition keywords in future iteration.
The normalization covers
properties,items, but OpenAI Structured Outputs also requires the constraints on schemas nested withinoneOf,anyOf,allOf, and$defs/definitions. If workflow authors use these JSON Schema features, their nested object schemas won't be normalized.Given the PR was end-to-end tested with real workflows, this is likely not urgent, but worth noting for future edge cases.
♻️ Sketch for handling composition keywords
function normalizeSchemaForOpenAI(schema: Record<string, unknown>): Record<string, unknown> { const out = { ...schema }; if (out.type === 'object') { // ... existing object handling } if (out.type === 'array' && typeof out.items === 'object' && out.items !== null) { out.items = normalizeSchemaForOpenAI(out.items as Record<string, unknown>); } + // Handle composition keywords + for (const keyword of ['oneOf', 'anyOf', 'allOf'] as const) { + if (Array.isArray(out[keyword])) { + out[keyword] = (out[keyword] as Record<string, unknown>[]).map(normalizeSchemaForOpenAI); + } + } + + // Handle definitions + for (const keyword of ['$defs', 'definitions'] as const) { + if (typeof out[keyword] === 'object' && out[keyword] !== null) { + const defs = out[keyword] as Record<string, Record<string, unknown>>; + const normalizedDefs: Record<string, Record<string, unknown>> = {}; + for (const [name, def] of Object.entries(defs)) { + normalizedDefs[name] = normalizeSchemaForOpenAI(def); + } + out[keyword] = normalizedDefs; + } + } return out; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/providers/src/codex/provider.ts` around lines 281 - 312, The normalizeSchemaForOpenAI function currently normalizes properties and items but skips JSON Schema composition and definition containers; update normalizeSchemaForOpenAI to also recursively traverse and normalize schemas found under "oneOf", "anyOf", "allOf" (each is an array of schemas), and definition containers like "$defs" and "definitions" (which are objects mapping names to schemas), ensuring you call normalizeSchemaForOpenAI on each nested schema entry and properly preserve array/object shapes; reference the existing function normalizeSchemaForOpenAI and the keys "properties" and "items" as examples of where recursion is applied and add analogous handling for those composition/definitions keys.
284-296: 💤 Low valueEdge case: objects without properties won't get
requiredarray.If
propertiesis undefined or empty, therequiredarray won't be set. OpenAI's strict mode might requirerequired: []for objects with no properties.🛡️ Defensive fix for empty properties
if (out.type === 'object') { if (!('additionalProperties' in out)) { out.additionalProperties = false; } + // Ensure required exists for all objects (OpenAI strict mode) + if (!Array.isArray(out.required)) { + out.required = []; + } if (typeof out.properties === 'object' && out.properties !== null) { const props = out.properties as Record<string, Record<string, unknown>>; const propKeys = Object.keys(props); - const existingRequired = Array.isArray(out.required) ? (out.required as string[]) : []; + const existingRequired = out.required as string[]; const missingRequired = propKeys.filter(k => !existingRequired.includes(k)); if (missingRequired.length > 0) { out.required = [...existingRequired, ...missingRequired]; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/providers/src/codex/provider.ts` around lines 284 - 296, For object schemas (when out.type === 'object') ensure we handle cases with missing or empty properties: still set out.additionalProperties = false if not present, and if out.properties is not an object, is null, or has no keys then set out.required to an empty array (or leave it if already an array). To implement, update the block around out.properties/out.required so that if typeof out.properties !== 'object' || out.properties === null || Object.keys(out.properties as Record<string, unknown> || {}).length === 0 you assign out.required = Array.isArray(out.required) ? out.required : [] (and optionally normalize out.properties = {}), otherwise preserve the existing logic that computes missingRequired from Object.keys(props).packages/providers/src/codex/provider.test.ts (1)
663-693: 💤 Low valueTest validates basic normalization; consider adding nested schema coverage.
The test correctly validates that
additionalProperties: falseis added andrequiredis preserved. However, it doesn't exercise the recursive normalization for nested objects or array items. Consider adding a test with nested schemas.💚 Example additional test for nested schemas
test('normalizes nested object schemas in properties and array items', async () => { mockRunStreamed.mockResolvedValue({ events: (async function* () { yield { type: 'turn.completed', usage: defaultUsage }; })(), }); const schema = { type: 'object', properties: { nested: { type: 'object', properties: { value: { type: 'string' } }, // Missing additionalProperties and required }, items: { type: 'array', items: { type: 'object', properties: { id: { type: 'number' } }, }, }, }, }; for await (const _ of client.sendQuery('test', '/workspace', undefined, { outputFormat: { type: 'json_schema', schema }, })) { // consume } expect(mockRunStreamed).toHaveBeenCalledWith( 'test', expect.objectContaining({ outputSchema: expect.objectContaining({ properties: expect.objectContaining({ nested: expect.objectContaining({ additionalProperties: false, required: ['value'], }), items: expect.objectContaining({ items: expect.objectContaining({ additionalProperties: false, required: ['id'], }), }), }), }), }) ); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/providers/src/codex/provider.test.ts` around lines 663 - 693, Add a new unit test that exercises recursive normalization of nested JSON schemas: create a test (e.g. 'normalizes nested object schemas in properties and array items') that sets mockRunStreamed to return a completed turn, defines a schema with nested object in properties (missing additionalProperties and required) and an array whose items are object schemas, call client.sendQuery with outputFormat: { type: 'json_schema', schema }, consume the async iterator, and assert mockRunStreamed was called with TurnOptions containing outputSchema where nested property schemas and array items have additionalProperties: false and required populated (use expect.objectContaining to check nested.required and nested.additionalProperties on the nested property names).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/providers/src/codex/provider.test.ts`:
- Around line 663-693: Add a new unit test that exercises recursive
normalization of nested JSON schemas: create a test (e.g. 'normalizes nested
object schemas in properties and array items') that sets mockRunStreamed to
return a completed turn, defines a schema with nested object in properties
(missing additionalProperties and required) and an array whose items are object
schemas, call client.sendQuery with outputFormat: { type: 'json_schema', schema
}, consume the async iterator, and assert mockRunStreamed was called with
TurnOptions containing outputSchema where nested property schemas and array
items have additionalProperties: false and required populated (use
expect.objectContaining to check nested.required and nested.additionalProperties
on the nested property names).
In `@packages/providers/src/codex/provider.ts`:
- Around line 281-312: The normalizeSchemaForOpenAI function currently
normalizes properties and items but skips JSON Schema composition and definition
containers; update normalizeSchemaForOpenAI to also recursively traverse and
normalize schemas found under "oneOf", "anyOf", "allOf" (each is an array of
schemas), and definition containers like "$defs" and "definitions" (which are
objects mapping names to schemas), ensuring you call normalizeSchemaForOpenAI on
each nested schema entry and properly preserve array/object shapes; reference
the existing function normalizeSchemaForOpenAI and the keys "properties" and
"items" as examples of where recursion is applied and add analogous handling for
those composition/definitions keys.
- Around line 284-296: For object schemas (when out.type === 'object') ensure we
handle cases with missing or empty properties: still set
out.additionalProperties = false if not present, and if out.properties is not an
object, is null, or has no keys then set out.required to an empty array (or
leave it if already an array). To implement, update the block around
out.properties/out.required so that if typeof out.properties !== 'object' ||
out.properties === null || Object.keys(out.properties as Record<string, unknown>
|| {}).length === 0 you assign out.required = Array.isArray(out.required) ?
out.required : [] (and optionally normalize out.properties = {}), otherwise
preserve the existing logic that computes missingRequired from
Object.keys(props).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 67fd02d5-213a-40fa-9c54-5bbf95cad06a
📒 Files selected for processing (2)
packages/providers/src/codex/provider.test.tspackages/providers/src/codex/provider.ts
Codex streams multiple intermediate JSON objects as progress updates during a turn. The accumulated text is a concatenation of all of them, which isn't valid JSON. When JSON.parse fails on the full text, we now extract the last complete top-level JSON object using brace-depth tracking — that's the authoritative final answer. Without this, structuredOutput was undefined for multi-message turns, causing downstream condition evaluators ($node.output.field) to fail with condition_json_parse_failed and skip conditional nodes. Co-authored-by: Cursor <cursoragent@cursor.com>
Review SummaryVerdict: minor-fixes-needed Your PR adds a clean schema normalizer and JSON extractor for OpenAI Structured Outputs compliance. The approach is sound and error handling is solid — no blocking issues. The main gap is test coverage: two new pure functions have zero unit tests. Blocking issues(None — this is ready for a follow-up PR if you prefer, but the tests are the most impactful additions.) Suggested fixes
Minor / nice-to-have
ComplimentsThe Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality. |
Summary
When using
provider: codexin workflow definitions, nodes withoutput_formatfail withinvalid_json_schemaerrors because OpenAI's Structured Outputs API has stricter schema requirements than Claude:type: "object"node must haveadditionalProperties: falsetype: "object"node must have arequiredarray listing all property keysWorkflow authors writing provider-agnostic YAML typically omit both since Claude doesn't require them — breaking Codex workflows.
Changes
normalizeSchemaForOpenAI()in the Codex provider (packages/providers/src/codex/provider.ts) that recursively walks the output schema tree and fills in the missing constraints before passing to the SDKbuildTurnOptions()for bothoutputFormat.schemaandnodeConfig.output_formatpathsTest plan
tsc --noEmit)remotion-idea-to-videoworkflow usingprovider: codex/model: gpt-5.3-codex— bothfetch-sourceandqa-reviewoutput_format schemas accepted by OpenAI API (previously both failed with 400)Error before fix
Made with Cursor
Summary by CodeRabbit
Bug Fixes
Tests