fix: handle empty text and malformed JSON in parse_text for thinking+tools#1240
Open
gn00295120 wants to merge 2 commits intoanthropics:mainfrom
Open
fix: handle empty text and malformed JSON in parse_text for thinking+tools#1240gn00295120 wants to merge 2 commits intoanthropics:mainfrom
gn00295120 wants to merge 2 commits intoanthropics:mainfrom
Conversation
…tools When using structured output with thinking and tool_use, parse_text() crashes on empty text blocks (from intermediate thinking-only turns) and on malformed JSON (from model generation artifacts). - Skip parsing when text is empty/whitespace - Skip structured output parsing on tool_use turns (intermediate) - Add fallback JSON extraction for malformed text blocks - Add comprehensive tests for all edge cases Fixes anthropics#1204
There was a problem hiding this comment.
Pull request overview
This PR addresses crashes and parsing failures in the structured-output response parsing pipeline when used alongside extended thinking and tool use (Issue #1204). It hardens parse_text() against empty text blocks and attempts to recover valid JSON when the model prepends malformed content, and it avoids parsing structured output on intermediate tool_use turns.
Changes:
- Update
parse_text()to returnNonefor empty/whitespace-only text and add a recovery path that attempts to extract the last JSON payload from malformed text. - Update
parse_response()/parse_beta_response()to skip structured parsing forstop_reason="tool_use"turns. - Add a new unit test module covering
parse_text(),_extract_last_json(), andparse_response()/parse_beta_response()edge cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/anthropic/lib/_parse/_response.py |
Adds empty-text handling, JSON recovery extraction, and skips structured parsing on tool_use turns. |
tests/lib/_parse/test_parse_text.py |
Adds new tests for empty text, malformed prefix recovery, and tool-use turn behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Previously `_extract_last_json()` used `str.find()` to locate the *first* opening brace/bracket before the last closing token. When the input contains a malformed/partial JSON object in a prefix (e.g. an unterminated string), `find()` would return that broken start position and depth counting would produce a non-zero result, so the function fell through and the final valid JSON was never recovered. Fix: replace the forward `find()` scan with a backward loop over every candidate `open_char` position (from `last_close` down to 0). We return the first (rightmost) position at which the depth count is balanced — which is always the last complete JSON payload in the text, regardless of what broken fragments appear before it. Also remove two unused imports (`typing.Optional`, `unittest.mock.MagicMock`) from the test file that would fail `ruff check --select F401`, and add a new test `test_malformed_prefix_with_partial_object_recovers_last_json` that exercises the exact scenario the review raised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1204 —
parse_text()crashes when used with structured output + extended thinking + tool use.Two bugs addressed:
Empty text crash: When the model returns
stop_reason="end_turn"with only a thinking block and an empty text block,parse_text("")callsvalidate_json("")which raisesValidationError.Malformed JSON prefix: When the model prefixes the JSON payload with reasoning text or a partial generation artifact,
validate_json()fails on the full string even though valid JSON exists at the end.Changes
src/anthropic/lib/_parse/_response.py:parse_text(): ReturnNonefor empty/whitespace text instead of crashingparse_text(): Add fallback JSON extraction via_extract_last_json()— finds the last valid JSON object/array in malformed textparse_response()/parse_beta_response(): Skip structured output parsing onstop_reason="tool_use"turns (intermediate tool-calling turns shouldn't have text parsed as structured output)tests/lib/_parse/test_parse_text.py(new):parse_responseandparse_beta_responseTest plan
parse_text("")returnsNone(not crash)parse_text(" \n\t ")returnsNoneparse_text('partial garbage\n\n{"valid": "json"}')recovers correctlyparse_text('not json')still raisesValidationErrorparse_response()withstop_reason="tool_use"skips text parsingparse_response()withstop_reason="end_turn"parses normally