Skip to content

fix(memory): count tokens for ToolCallBlock, ThinkingBlock, CitableBlock, CitationBlock#22153

Open
citizen204 wants to merge 1 commit into
run-llama:mainfrom
citizen204:fix-21950-toolcall-token-counting
Open

fix(memory): count tokens for ToolCallBlock, ThinkingBlock, CitableBlock, CitationBlock#22153
citizen204 wants to merge 1 commit into
run-llama:mainfrom
citizen204:fix-21950-toolcall-token-counting

Conversation

@citizen204

Copy link
Copy Markdown
Contributor

Summary

Memory._estimate_token_count() excluded ToolCallBlock from the scanned blocks entirely (it was filtered out alongside CachePoint), and had no counting branch for ThinkingBlock, CitableBlock, or CitationBlock — even though these three types were admitted into the blocks list.

For tool-using agents with substantial tool_kwargs, the FIFO queue accumulates far more real tokens than Memory believes, leaving the history well above token_limit and eventually surfacing as provider-side "prompt is too long" (HTTP 400) errors.

Fixes #21950

Changes

  • llama-index-core/llama_index/core/memory/memory.py:
    • Remove ToolCallBlock from the CachePoint-only exclusion so it enters the blocks list.
    • Add counting branch for ToolCallBlock: tokenize tool_name + str(tool_kwargs).
    • Add counting branch for ThinkingBlock: use num_tokens when available, else tokenize content.
    • Add counting branch for CitableBlock: tokenize title + source, then recurse into inner content blocks.
    • Add counting branch for CitationBlock: tokenize title + source, then count cited_content.

…ock, CitationBlock

_estimate_token_count excluded ToolCallBlock from the scanned
blocks entirely, and had no counting branch for ThinkingBlock,
CitableBlock, or CitationBlock even though they were admitted
into the list.

For tool-using agents with large tool_kwargs this caused the
FIFO queue to stay well above the token_limit, surfacing as
provider-side "prompt is too long" (HTTP 400) errors.

Changes:
- Remove ToolCallBlock from the CachePoint-only exclusion so
  it is included in the blocks list.
- Add counting branches for ToolCallBlock (tool_name + kwargs
  serialized), ThinkingBlock (num_tokens if known, else content
  text), CitableBlock (title + source + inner content), and
  CitationBlock (title + source + cited_content).

Fixes run-llama#21950
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 26, 2026
@gautamvarmadatla

Copy link
Copy Markdown
Contributor

hi! please see #21951 it has the fix already :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Tool-call tokens not counted by Memory, leading to "prompt is too long" (400) errors with AgentWorkflow

2 participants