fix(agent): repair Windows-path tool args; gate SD override (#1023) by kovtcharov · Pull Request #1027 · amd/gaia

kovtcharov · 2026-05-10T04:15:13Z

Why this matters

gaia sd <prompt> on Windows would generate an image successfully in Step 1 but then return "Image generation is not available — start GAIA with the --sd flag to enable it" in Step 2 — telling the user to enable a feature that was already on, while silently dropping the story output.

Two stacked bugs:

Parse fragility — the native tool_calls parser strictly required JSON-valid backslash escapes, but the LLM probabilistically emits C:\Users\Klaus (single-escape) which JSON rejects as Invalid \escape: \U. Now repaired before re-raise; idempotent and lossless on already-valid input.
Overaggressive verbose-failure override — the post-failure guard fired whenever generate_image had been called this turn, even when it succeeded. Now gated on the latest capability call actually erroring.

Test plan

pytest tests/unit/agents/test_parse_error_recovery.py — 5 new tests + 6 pre-existing all pass
pytest tests/unit/agents/ tests/unit/test_sd_agent.py tests/unit/test_tool_call_priority.py — 190 pass, no regressions
Manual on Windows: gaia sd "a forest" --logging-level DEBUG — Step 2 parses the Windows path, story appears in the final answer (not the misleading "--sd" message)

Closes #1023

`gaia sd <prompt>` on Windows would generate an image successfully in Step 1 but then return *"Image generation is not available — start GAIA with the `--sd` flag to enable it"* in Step 2 — telling the user to enable a feature that was already on, while silently dropping the story output. Two stacked bugs: A. Parse fragility: the native tool_calls parser strictly required JSON-valid backslash escapes. Smaller LLMs (Gemma-4-E4B-class) probabilistically emit Windows paths with single backslashes (`C:\Users\Klaus`), which strict JSON rejects as `Invalid \escape`. `_parse_llm_response` now retries once with `\X` (where X is not a valid JSON escape char) doubled to `\\X` — idempotent on already-valid input. Truly malformed JSON still raises loudly. B. Overaggressive verbose-failure override: the post-failure guard at `_process_query_impl` fired whenever `generate_image` had been *called* this turn — with no check on whether the call actually returned an error. When generate_image succeeded and a *different* tool's parse error provoked a verbose apology, the guard clobbered the model's reply with the misleading "not available" message. Track the latest capability-tool outcome and gate the override on `capability_tool_last_succeeded is False`.

Review pass on the #1023 fix surfaced one inconsistency and one missing regression guard. The new ``capability_tool_last_succeeded`` tracker matched ``tool_name.startswith(...)`` while the pre-existing ``has_tried_capability_tool`` check uses ``.lower().startswith(...)``. A model emitting ``Generate_Image`` would slip past the tracker (case mismatch) but still trigger ``has_tried_capability_tool`` (after lower) -- so on a *failed* call the gate evaluated ``True and (None is False)`` = False and the override silently failed to fire, leaking the verbose apology to the user instead of the canonical "not available" message. Both tracker call sites now apply ``.lower()`` to match. Tests: added ``test_override_still_fires_when_generate_image_failed`` as a regression guard for the legitimate failure path (without it, a future refactor could remove the gate entirely and only the post-success test would catch it), and replaced the previously-useless mixed-case success test with ``test_override_fires_after_mixed_case_capability_failure`` -- the success variant passed regardless of the bug because both branches yield "no override" when ``capability_tool_last_succeeded`` is ``None``; only the failure variant pins the ``.lower()`` semantics down. Red-green verified by stashing the ``.lower()`` fix and watching the new test fail with the exact verbose-apology leak.

kovtcharov requested a review from kovtcharov-amd as a code owner May 10, 2026 04:15

github-actions Bot added tests Test changes agents labels May 10, 2026

kovtcharov mentioned this pull request May 10, 2026

[Bug]: tool call parsing fails in SD step #1023

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): repair Windows-path tool args; gate SD override (#1023)#1027

fix(agent): repair Windows-path tool args; gate SD override (#1023)#1027
kovtcharov wants to merge 2 commits intomainfrom
fix/sd-windows-path-escape-1023

kovtcharov commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kovtcharov commented May 10, 2026

Why this matters

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant