Skip to content

fix: add strict:false to Responses API tools and gpt-5.4 to known models#8636

Merged
baxen merged 1 commit intomainfrom
fix/openai-gpt-5.4-responses-api-strict
Apr 21, 2026
Merged

fix: add strict:false to Responses API tools and gpt-5.4 to known models#8636
baxen merged 1 commit intomainfrom
fix/openai-gpt-5.4-responses-api-strict

Conversation

@baxen
Copy link
Copy Markdown
Collaborator

@baxen baxen commented Apr 18, 2026

Category: fix
User Impact: Users can now use OpenAI's gpt-5.4 model with goose for tool-calling workflows that previously failed silently or returned API errors.

Problem: GPT-5.4 requires the Responses API (/v1/responses), which defaults strict: true for function tool definitions. MCP tool schemas are dynamically generated at runtime and don't conform to strict mode requirements (e.g., missing additionalProperties: false), causing the API to reject tool calls. Additionally, gpt-5.4 was missing from the known models list, so it wouldn't get a proper context limit.

Solution: Explicitly set strict: false on tool definitions in create_responses_request so the Responses API accepts MCP tool schemas as-is — matching the non-strict default that Chat Completions used. Also added gpt-5.4 to OPEN_AI_KNOWN_MODELS with a 400k context limit. Both changes were developed using red-green TDD.

File changes

crates/goose/src/providers/formats/openai_responses.rs
Added "strict": false to each tool definition in create_responses_request. This prevents the Responses API from applying strict schema validation to MCP tool schemas that can't guarantee compliance. Also added a test (test_responses_tools_include_strict_false) that constructs a request with a tool and asserts the strict field is present and set to false.

crates/goose/src/providers/openai.rs
Added ("gpt-5.4", 400_000) to OPEN_AI_KNOWN_MODELS so the model gets a proper context limit. Added a test (gpt_5_4_is_in_known_models) that asserts gpt-5.4 is present in the list.

Reproduction Steps

  1. Configure goose with an OpenAI API key and set the model to gpt-5.4.
  2. Start a session and ask goose to perform a task that requires tool use (e.g., "list the files in the current directory").
  3. Verify that goose successfully calls tools and returns results without API errors about strict schema validation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee17f24905

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/goose/src/providers/openai.rs Outdated
Comment on lines +60 to +63
("gpt-5.4", 400_000),
("gpt-5.4-mini", 400_000),
("gpt-5.4-nano", 128_000),
("gpt-5.4-pro", 400_000),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct GPT-5.4 family context limits in known models

OPEN_AI_KNOWN_MODELS assigns gpt-5.4 and gpt-5.4-pro a 400_000 limit and gpt-5.4-nano a 128_000 limit, but the current model specs are 1,050,000 for gpt-5.4/gpt-5.4-pro and 400,000 for gpt-5.4-nano. Because these values are propagated into ModelConfig when provider defaults are applied, Goose will compact/truncate conversation history far earlier than necessary for these models, materially reducing available long-context behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12e2304d68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/goose/src/providers/openai.rs Outdated
Comment on lines +60 to +63
("gpt-5.4", 400_000),
("gpt-5.4-mini", 400_000),
("gpt-5.4-nano", 128_000),
("gpt-5.4-pro", 400_000),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct gpt-5.4 family context limits

OPEN_AI_KNOWN_MODELS hard-codes gpt-5.4/gpt-5.4-pro at 400_000 and gpt-5.4-nano at 128_000, but the repo’s canonical model table (crates/goose/src/providers/canonical/data/canonical_models.json) lists 1_050_000 for gpt-5.4/gpt-5.4-pro and 400_000 for gpt-5.4-nano. Because OpenAiProvider::metadata() publishes these literal values and ProviderEntry::normalize_model_config falls back to provider metadata when canonical lookup is unavailable, this underreports/undersizes context windows for these models.

Useful? React with 👍 / 👎.

@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch from 12e2304 to 36a396a Compare April 19, 2026 21:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36a396a3ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +273 to +275
let base = normalized
.strip_prefix("databricks-")
.unwrap_or(&normalized);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle goose-prefixed GPT-5.4 models in responses routing

DatabricksProvider::is_responses_model removes only the databricks- prefix before matching GPT-5 families, so names like goose-gpt-5.4/goose-gpt-5.4-high are treated as non-Responses models. In stream(), that sends these models through the Chat Completions path instead of create_responses_request, which can break tool-calling for GPT-5.4 variants. This is inconsistent with the rest of the codebase (for example ModelConfig::is_openai_reasoning_model and extract_reasoning_effort) that already recognizes goose- prefixes.

Useful? React with 👍 / 👎.

@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch from 36a396a to da0141d Compare April 21, 2026 00:19
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da0141deeb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/goose/src/providers/utils.rs Outdated
/// automatically.
pub fn is_openai_responses_model(model_name: &str) -> bool {
static RE: OnceLock<Regex> = OnceLock::new();
let re = RE.get_or_init(|| Regex::new(r"(?i)^o\d|^gpt-5").unwrap());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep o1-mini models out of Responses routing

is_openai_responses_model now matches every o* model via ^o\d, so o1-mini/o1-preview are treated as Responses models and routed through OpenAiProvider::stream’s Responses branch instead of Chat Completions. That bypasses the existing o1-mini safeguard in providers/formats/openai.rs::create_request (and the analogous Databricks formatter guard), so unsupported tool-calling models are no longer rejected early and instead fail at runtime API call time; this is especially risky because the canonical table marks openai/o1-mini as "tool_call": false in canonical_models.json.

Useful? React with 👍 / 👎.

@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch from da0141d to fc414a3 Compare April 21, 2026 00:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc414a3ab7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pub fn is_openai_responses_model(model_name: &str) -> bool {
static RE: OnceLock<Regex> = OnceLock::new();
let re =
RE.get_or_init(|| Regex::new(r"(?i)(?:^|[-/])(?:o\d+(?:$|-)|gpt-5(?:$|[-.]))").unwrap());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep o1-mini and o1-preview out of Responses routing

The new is_openai_responses_model regex matches any o\d family, which now classifies o1-mini and o1-preview as Responses models and routes them through the Responses path (OpenAiProvider::stream/DatabricksProvider::stream) instead of the Chat Completions formatter path that contains the existing early o1-mini safeguard. That means tool-enabled requests for these models now fail only at runtime API calls; this is inconsistent with the canonical capabilities table, which still marks openai/o1-mini and openai/o1-preview as tool_call: false in canonical_models.json (lines 80475-80510).

Useful? React with 👍 / 👎.

@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch 2 times, most recently from efcfea7 to 8530161 Compare April 21, 2026 03:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8530161bb6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pub fn is_openai_responses_model(model_name: &str) -> bool {
static RE: OnceLock<Regex> = OnceLock::new();
let re =
RE.get_or_init(|| Regex::new(r"(?i)(?:^|[-/])(?:o\d+(?:$|-)|gpt-5(?:$|[-.]))").unwrap());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep o-series chat fallback on OpenAI-compatible hosts

is_openai_responses_model now matches every o* family, which makes OpenAiProvider::should_use_responses_api route o1/o3/o4 models to /v1/responses whenever base_path is the default v1/chat/completions. This breaks OpenAI-compatible hosts that expose only Chat Completions (a common setup in declarative providers using chat-completions URLs), because stream() enters the Responses branch and has no 404 fallback back to chat. The regression is input-dependent (custom host + o* model), but it can turn previously working requests into hard endpoint errors.

Useful? React with 👍 / 👎.

@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch from 8530161 to 319a400 Compare April 21, 2026 03:53
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 319a4001f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +212 to 213
if !is_openai_responses_model(model_name) {
return (model_name.to_string(), None);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Decouple reasoning-effort extraction from Responses routing

extract_reasoning_effort now bails out whenever is_openai_responses_model is false, which reclassifies plain GPT-5 families (gpt-5, gpt-5-mini, gpt-5-nano, etc.) as non-reasoning. That change propagates into providers/formats/openai.rs::create_request (and Databricks equivalent), causing these models to use system role + max_tokens and to allow temperature instead of the reasoning path (reasoning_effort + max_completion_tokens), so requests for GPT-5 chat-completions models can be malformed or behaviorally downgraded.

Useful? React with 👍 / 👎.

Comment thread crates/goose/src/providers/utils.rs Outdated
/// work without provider-specific normalization.
pub fn is_openai_responses_model(model_name: &str) -> bool {
static RE: OnceLock<Regex> = OnceLock::new();
let re = RE.get_or_init(|| Regex::new(r"(?i)(?:^|[-/])(?:o\d+(?:$|-)|gpt-5[.-]\d)").unwrap());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route gpt-5-codex through Responses API

The new Responses-model regex only matches gpt-5 names with a numeric segment (gpt-5.2, gpt-5-4, etc.), so gpt-5-codex no longer matches and OpenAiProvider::should_use_responses_api now sends it to chat-completions. This is a regression from the prior codex-specific routing and can break tool-calling for users selecting the supported gpt-5-codex model on providers that require /v1/responses for codex variants.

Useful? React with 👍 / 👎.

@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch 2 times, most recently from 29ab29a to 217b67f Compare April 21, 2026 04:23
@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch from 217b67f to 4bab639 Compare April 21, 2026 22:14
…ing effort lookup

- Unify is_openai_responses_model() regex for routing all o-series and
  gpt-5 family models to the Responses API across OpenAI and Databricks
  providers.
- extract_reasoning_effort() no longer defaults to "medium"; omits the
  reasoning field when no explicit suffix is provided, letting the API
  apply its own per-model default (avoids 400s on gpt-5-pro which only
  accepts "high").
- Add context window limits for new gpt-5 model variants.
- Set strict: false on Responses API tool schemas (MCP schemas are not
  strict-compatible).
- Add Responses API mock to session_id_propagation_test.
- Switch test fixtures from responses-routed models (gpt-5-nano, o4-mini)
  to gpt-4.1 for Chat Completions mock compatibility.
  TODO: add Responses API mock to OpenAiFixture for full coverage.

Signed-off-by: Bradley Axen <baxen@squareup.com>
@baxen baxen force-pushed the fix/openai-gpt-5.4-responses-api-strict branch from 4bab639 to c5e8fbe Compare April 21, 2026 22:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5e8fbe998

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

let session_id = &session_b.session_id().0;
match via {
SetModelVia::Dedicated => conn.set_model(session_id, "o4-mini").await.unwrap(),
SetModelVia::Dedicated => conn.set_model(session_id, "gpt-4.1").await.unwrap(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a non-default model in model-switch ACP tests

TEST_MODEL was changed to gpt-4.1, so this set_model call now sets Session B to the same model it already has from TestConnectionConfig::default(). That makes run_model_set/run_config_option_model_set effectively no-ops for model switching, so they can pass even if the session model update path regresses. Switch Session B to a different chat-completions model to keep validating the model-change behavior.

Useful? React with 👍 / 👎.

// responses-routed models like o4-mini here.
let config = TestConnectionConfig {
current_model: "o4-mini".to_string(),
current_model: "gpt-4.1".to_string(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore actual mismatch in prompt-model-mismatch test

This test is intended to start with a server model different from TEST_MODEL, but after TEST_MODEL was updated to gpt-4.1, setting current_model to gpt-4.1 removes the mismatch entirely. The branch that forces set_model(TEST_MODEL) before prompting is no longer exercised, so regressions in mismatch handling can slip through undetected.

Useful? React with 👍 / 👎.

@baxen baxen added this pull request to the merge queue Apr 21, 2026
Merged via the queue into main with commit ab0c40d Apr 21, 2026
23 checks passed
@baxen baxen deleted the fix/openai-gpt-5.4-responses-api-strict branch April 21, 2026 22:46
spikewang pushed a commit to spikewang/goose that referenced this pull request Apr 22, 2026
…els (aaif-goose#8636)

Signed-off-by: Bradley Axen <baxen@squareup.com>
lifeizhou-ap added a commit that referenced this pull request Apr 22, 2026
* main:
  feat: extend goose2 context window ux with auto-compaction (#8721)
  improve goose2 agent management flows (#8737)
  alexhancock/tui-improvements (#8736)
  fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636)
  persist and reliably apply chat model selection (#8734)
  merge goose-acp crate into goose (#8726)
  docs: AGENTS.md section on goose2 desktop backend architecture (#8732)
  feat: goose2 message bubble + action tray (#8720)
  consolidate provider ACP methods onto inventory (#8710)
  ci: declare and enforce MSRV of 1.91.1 (#8670)
  fix(ui): correct grammar in apps view description (#8668) (#8679)
  Stop load openai fast model for openapi compatible custom endpoint (#8644)
lifeizhou-ap added a commit that referenced this pull request Apr 22, 2026
* main: (41 commits)
  removed the specific code owner for documentation change (#8749)
  fix(providers): handle missing delta field in streaming chunks (#8700)
  refactor(providers): extract http_status module and rename handle_status_openai_compat (#8620)
  fix(providers/openai): accept streaming chunks with both reasoning fields (#8715)
  feat: associate threads with projects (#8745)
  upgrade goose sdk and tui to be compatible with the latest agentclientprotocol/sdk package (#8667)
  feat: extend goose2 context window ux with auto-compaction (#8721)
  improve goose2 agent management flows (#8737)
  alexhancock/tui-improvements (#8736)
  fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636)
  persist and reliably apply chat model selection (#8734)
  merge goose-acp crate into goose (#8726)
  docs: AGENTS.md section on goose2 desktop backend architecture (#8732)
  feat: goose2 message bubble + action tray (#8720)
  consolidate provider ACP methods onto inventory (#8710)
  ci: declare and enforce MSRV of 1.91.1 (#8670)
  fix(ui): correct grammar in apps view description (#8668) (#8679)
  Stop load openai fast model for openapi compatible custom endpoint (#8644)
  feat(hooks): add Husky git hooks for ui/goose2 (#8577)
  fix: links in chat could not be opened (#8544)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants