Skip to content

API does not cleanly separate tool-use mechanics (search queries, results, narration) from the final authored content #1277

@greening

Description

@greening

Here's an interaction between me and Claude:

**OK here is the output from draft A:

Generating Draft A...
Done (606s) | tokens: 2,330,794 in / 16,405 out | words: 12 | cost: $36.19

"Now let me verify the word count and do a final review:"**

So disabling extended thinking didn't fix the problem. The 16,405 output tokens producing only 12 words means the streaming handler is still capturing non-content text — in this case, web search tool narration ("Now let me verify the word count and do a final review:").

And 2.3M input tokens — that's Opus doing dozens of web searches, each pulling in huge amounts of content.

Two separate problems:

The text extraction from streaming is broken for web search calls. When Claude uses web search tools, the response contains tool_use blocks, tool_result blocks, and text blocks interleaved. The streaming handler is probably only capturing the last text block (the narration) instead of the final substantive text.

Opus with web search is absurdly expensive for this task. 2.3M input tokens at $15/M = $34.50 just in input for one chapter.

Let me look at the streaming handler:

Read retry logic in county report generation script
Read retry logic in county report generation script

The problem is on line 348: it takes the last text block, assuming that's the real content. But with web search, Claude emits multiple text blocks — the real content might be an earlier one, and the last one is just narration like "Now let me verify..."

The fix: instead of taking the last text block, take the longest text block — the actual chapter content will be much longer than any narration snippet.

Verified Python syntax in generate_county_report.py
Verified Python syntax in generate_county_report.py

Done. The text extraction now picks the longest text block instead of the last one. This should capture the actual chapter content rather than trailing narration.

The $36 per chapter cost is a separate issue — that's Opus doing extensive web research. We can discuss whether to cap searches or switch to Sonnet for drafts after confirming the content extraction works.

Is there a chance that narration is longer in some text output than the chapter content? Isn't there a better way to determine this?

You're right — taking the longest block is fragile. If Claude narrates extensively between searches, a narration block could exceed the content block.

A better approach: look at the structure of the content, not just length. The real chapter content will contain markdown headings (##), while narration blocks won't. So pick the text block that contains markdown section headings.

If multiple blocks have headings (unlikely), fall back to longest. If none have headings (also unlikely), fall back to longest.

Verified Python syntax in generate_county_report.py
Verified Python syntax in generate_county_report.py

Done. The extraction now:

Prefers text blocks containing markdown headings (#, ##, etc.)
If multiple blocks have headings, picks the longest
If no blocks have headings (intro/conclusion case), falls back to longest
Seems like a huge API bug in Claude that you have to do this.

Agreed. The API should cleanly separate tool-use mechanics (search queries, results, narration) from the final authored content. Forcing consumers to heuristically pick the "real" text block from an interleaved stream of narration and content is brittle and error-prone.

The web search tool is relatively new, and this feels like a design gap they haven't closed yet — the API returns everything as flat text blocks with no semantic distinction between "Claude thinking out loud while researching" and "Claude's final answer." A content_type field or a dedicated final_response block would solve it cleanly.

Worth filing at github.com/anthropics/anthropic-sdk-python/issues if you want to push on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions