fix: add strict:false to Responses API tools and gpt-5.4 to known models#8636
fix: add strict:false to Responses API tools and gpt-5.4 to known models#8636
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee17f24905
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ("gpt-5.4", 400_000), | ||
| ("gpt-5.4-mini", 400_000), | ||
| ("gpt-5.4-nano", 128_000), | ||
| ("gpt-5.4-pro", 400_000), |
There was a problem hiding this comment.
Correct GPT-5.4 family context limits in known models
OPEN_AI_KNOWN_MODELS assigns gpt-5.4 and gpt-5.4-pro a 400_000 limit and gpt-5.4-nano a 128_000 limit, but the current model specs are 1,050,000 for gpt-5.4/gpt-5.4-pro and 400,000 for gpt-5.4-nano. Because these values are propagated into ModelConfig when provider defaults are applied, Goose will compact/truncate conversation history far earlier than necessary for these models, materially reducing available long-context behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12e2304d68
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ("gpt-5.4", 400_000), | ||
| ("gpt-5.4-mini", 400_000), | ||
| ("gpt-5.4-nano", 128_000), | ||
| ("gpt-5.4-pro", 400_000), |
There was a problem hiding this comment.
Correct gpt-5.4 family context limits
OPEN_AI_KNOWN_MODELS hard-codes gpt-5.4/gpt-5.4-pro at 400_000 and gpt-5.4-nano at 128_000, but the repo’s canonical model table (crates/goose/src/providers/canonical/data/canonical_models.json) lists 1_050_000 for gpt-5.4/gpt-5.4-pro and 400_000 for gpt-5.4-nano. Because OpenAiProvider::metadata() publishes these literal values and ProviderEntry::normalize_model_config falls back to provider metadata when canonical lookup is unavailable, this underreports/undersizes context windows for these models.
Useful? React with 👍 / 👎.
12e2304 to
36a396a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36a396a3ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let base = normalized | ||
| .strip_prefix("databricks-") | ||
| .unwrap_or(&normalized); |
There was a problem hiding this comment.
Handle goose-prefixed GPT-5.4 models in responses routing
DatabricksProvider::is_responses_model removes only the databricks- prefix before matching GPT-5 families, so names like goose-gpt-5.4/goose-gpt-5.4-high are treated as non-Responses models. In stream(), that sends these models through the Chat Completions path instead of create_responses_request, which can break tool-calling for GPT-5.4 variants. This is inconsistent with the rest of the codebase (for example ModelConfig::is_openai_reasoning_model and extract_reasoning_effort) that already recognizes goose- prefixes.
Useful? React with 👍 / 👎.
36a396a to
da0141d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da0141deeb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// automatically. | ||
| pub fn is_openai_responses_model(model_name: &str) -> bool { | ||
| static RE: OnceLock<Regex> = OnceLock::new(); | ||
| let re = RE.get_or_init(|| Regex::new(r"(?i)^o\d|^gpt-5").unwrap()); |
There was a problem hiding this comment.
Keep o1-mini models out of Responses routing
is_openai_responses_model now matches every o* model via ^o\d, so o1-mini/o1-preview are treated as Responses models and routed through OpenAiProvider::stream’s Responses branch instead of Chat Completions. That bypasses the existing o1-mini safeguard in providers/formats/openai.rs::create_request (and the analogous Databricks formatter guard), so unsupported tool-calling models are no longer rejected early and instead fail at runtime API call time; this is especially risky because the canonical table marks openai/o1-mini as "tool_call": false in canonical_models.json.
Useful? React with 👍 / 👎.
da0141d to
fc414a3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc414a3ab7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn is_openai_responses_model(model_name: &str) -> bool { | ||
| static RE: OnceLock<Regex> = OnceLock::new(); | ||
| let re = | ||
| RE.get_or_init(|| Regex::new(r"(?i)(?:^|[-/])(?:o\d+(?:$|-)|gpt-5(?:$|[-.]))").unwrap()); |
There was a problem hiding this comment.
Keep o1-mini and o1-preview out of Responses routing
The new is_openai_responses_model regex matches any o\d family, which now classifies o1-mini and o1-preview as Responses models and routes them through the Responses path (OpenAiProvider::stream/DatabricksProvider::stream) instead of the Chat Completions formatter path that contains the existing early o1-mini safeguard. That means tool-enabled requests for these models now fail only at runtime API calls; this is inconsistent with the canonical capabilities table, which still marks openai/o1-mini and openai/o1-preview as tool_call: false in canonical_models.json (lines 80475-80510).
Useful? React with 👍 / 👎.
efcfea7 to
8530161
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8530161bb6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn is_openai_responses_model(model_name: &str) -> bool { | ||
| static RE: OnceLock<Regex> = OnceLock::new(); | ||
| let re = | ||
| RE.get_or_init(|| Regex::new(r"(?i)(?:^|[-/])(?:o\d+(?:$|-)|gpt-5(?:$|[-.]))").unwrap()); |
There was a problem hiding this comment.
Keep o-series chat fallback on OpenAI-compatible hosts
is_openai_responses_model now matches every o* family, which makes OpenAiProvider::should_use_responses_api route o1/o3/o4 models to /v1/responses whenever base_path is the default v1/chat/completions. This breaks OpenAI-compatible hosts that expose only Chat Completions (a common setup in declarative providers using chat-completions URLs), because stream() enters the Responses branch and has no 404 fallback back to chat. The regression is input-dependent (custom host + o* model), but it can turn previously working requests into hard endpoint errors.
Useful? React with 👍 / 👎.
8530161 to
319a400
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 319a4001f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !is_openai_responses_model(model_name) { | ||
| return (model_name.to_string(), None); |
There was a problem hiding this comment.
Decouple reasoning-effort extraction from Responses routing
extract_reasoning_effort now bails out whenever is_openai_responses_model is false, which reclassifies plain GPT-5 families (gpt-5, gpt-5-mini, gpt-5-nano, etc.) as non-reasoning. That change propagates into providers/formats/openai.rs::create_request (and Databricks equivalent), causing these models to use system role + max_tokens and to allow temperature instead of the reasoning path (reasoning_effort + max_completion_tokens), so requests for GPT-5 chat-completions models can be malformed or behaviorally downgraded.
Useful? React with 👍 / 👎.
| /// work without provider-specific normalization. | ||
| pub fn is_openai_responses_model(model_name: &str) -> bool { | ||
| static RE: OnceLock<Regex> = OnceLock::new(); | ||
| let re = RE.get_or_init(|| Regex::new(r"(?i)(?:^|[-/])(?:o\d+(?:$|-)|gpt-5[.-]\d)").unwrap()); |
There was a problem hiding this comment.
Route gpt-5-codex through Responses API
The new Responses-model regex only matches gpt-5 names with a numeric segment (gpt-5.2, gpt-5-4, etc.), so gpt-5-codex no longer matches and OpenAiProvider::should_use_responses_api now sends it to chat-completions. This is a regression from the prior codex-specific routing and can break tool-calling for users selecting the supported gpt-5-codex model on providers that require /v1/responses for codex variants.
Useful? React with 👍 / 👎.
29ab29a to
217b67f
Compare
217b67f to
4bab639
Compare
…ing effort lookup - Unify is_openai_responses_model() regex for routing all o-series and gpt-5 family models to the Responses API across OpenAI and Databricks providers. - extract_reasoning_effort() no longer defaults to "medium"; omits the reasoning field when no explicit suffix is provided, letting the API apply its own per-model default (avoids 400s on gpt-5-pro which only accepts "high"). - Add context window limits for new gpt-5 model variants. - Set strict: false on Responses API tool schemas (MCP schemas are not strict-compatible). - Add Responses API mock to session_id_propagation_test. - Switch test fixtures from responses-routed models (gpt-5-nano, o4-mini) to gpt-4.1 for Chat Completions mock compatibility. TODO: add Responses API mock to OpenAiFixture for full coverage. Signed-off-by: Bradley Axen <baxen@squareup.com>
4bab639 to
c5e8fbe
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5e8fbe998
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let session_id = &session_b.session_id().0; | ||
| match via { | ||
| SetModelVia::Dedicated => conn.set_model(session_id, "o4-mini").await.unwrap(), | ||
| SetModelVia::Dedicated => conn.set_model(session_id, "gpt-4.1").await.unwrap(), |
There was a problem hiding this comment.
Use a non-default model in model-switch ACP tests
TEST_MODEL was changed to gpt-4.1, so this set_model call now sets Session B to the same model it already has from TestConnectionConfig::default(). That makes run_model_set/run_config_option_model_set effectively no-ops for model switching, so they can pass even if the session model update path regresses. Switch Session B to a different chat-completions model to keep validating the model-change behavior.
Useful? React with 👍 / 👎.
| // responses-routed models like o4-mini here. | ||
| let config = TestConnectionConfig { | ||
| current_model: "o4-mini".to_string(), | ||
| current_model: "gpt-4.1".to_string(), |
There was a problem hiding this comment.
Restore actual mismatch in prompt-model-mismatch test
This test is intended to start with a server model different from TEST_MODEL, but after TEST_MODEL was updated to gpt-4.1, setting current_model to gpt-4.1 removes the mismatch entirely. The branch that forces set_model(TEST_MODEL) before prompting is no longer exercised, so regressions in mismatch handling can slip through undetected.
Useful? React with 👍 / 👎.
…els (aaif-goose#8636) Signed-off-by: Bradley Axen <baxen@squareup.com>
* main: feat: extend goose2 context window ux with auto-compaction (#8721) improve goose2 agent management flows (#8737) alexhancock/tui-improvements (#8736) fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636) persist and reliably apply chat model selection (#8734) merge goose-acp crate into goose (#8726) docs: AGENTS.md section on goose2 desktop backend architecture (#8732) feat: goose2 message bubble + action tray (#8720) consolidate provider ACP methods onto inventory (#8710) ci: declare and enforce MSRV of 1.91.1 (#8670) fix(ui): correct grammar in apps view description (#8668) (#8679) Stop load openai fast model for openapi compatible custom endpoint (#8644)
* main: (41 commits) removed the specific code owner for documentation change (#8749) fix(providers): handle missing delta field in streaming chunks (#8700) refactor(providers): extract http_status module and rename handle_status_openai_compat (#8620) fix(providers/openai): accept streaming chunks with both reasoning fields (#8715) feat: associate threads with projects (#8745) upgrade goose sdk and tui to be compatible with the latest agentclientprotocol/sdk package (#8667) feat: extend goose2 context window ux with auto-compaction (#8721) improve goose2 agent management flows (#8737) alexhancock/tui-improvements (#8736) fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636) persist and reliably apply chat model selection (#8734) merge goose-acp crate into goose (#8726) docs: AGENTS.md section on goose2 desktop backend architecture (#8732) feat: goose2 message bubble + action tray (#8720) consolidate provider ACP methods onto inventory (#8710) ci: declare and enforce MSRV of 1.91.1 (#8670) fix(ui): correct grammar in apps view description (#8668) (#8679) Stop load openai fast model for openapi compatible custom endpoint (#8644) feat(hooks): add Husky git hooks for ui/goose2 (#8577) fix: links in chat could not be opened (#8544) ...
Category: fix
User Impact: Users can now use OpenAI's gpt-5.4 model with goose for tool-calling workflows that previously failed silently or returned API errors.
Problem: GPT-5.4 requires the Responses API (
/v1/responses), which defaultsstrict: truefor function tool definitions. MCP tool schemas are dynamically generated at runtime and don't conform to strict mode requirements (e.g., missingadditionalProperties: false), causing the API to reject tool calls. Additionally, gpt-5.4 was missing from the known models list, so it wouldn't get a proper context limit.Solution: Explicitly set
strict: falseon tool definitions increate_responses_requestso the Responses API accepts MCP tool schemas as-is — matching the non-strict default that Chat Completions used. Also added gpt-5.4 toOPEN_AI_KNOWN_MODELSwith a 400k context limit. Both changes were developed using red-green TDD.File changes
crates/goose/src/providers/formats/openai_responses.rs
Added
"strict": falseto each tool definition increate_responses_request. This prevents the Responses API from applying strict schema validation to MCP tool schemas that can't guarantee compliance. Also added a test (test_responses_tools_include_strict_false) that constructs a request with a tool and asserts thestrictfield is present and set tofalse.crates/goose/src/providers/openai.rs
Added
("gpt-5.4", 400_000)toOPEN_AI_KNOWN_MODELSso the model gets a proper context limit. Added a test (gpt_5_4_is_in_known_models) that asserts gpt-5.4 is present in the list.Reproduction Steps
gpt-5.4.