fix(memory): count tokens for ToolCallBlock, ThinkingBlock, CitableBlock, CitationBlock#22153
Open
citizen204 wants to merge 1 commit into
Open
fix(memory): count tokens for ToolCallBlock, ThinkingBlock, CitableBlock, CitationBlock#22153citizen204 wants to merge 1 commit into
citizen204 wants to merge 1 commit into
Conversation
…ock, CitationBlock _estimate_token_count excluded ToolCallBlock from the scanned blocks entirely, and had no counting branch for ThinkingBlock, CitableBlock, or CitationBlock even though they were admitted into the list. For tool-using agents with large tool_kwargs this caused the FIFO queue to stay well above the token_limit, surfacing as provider-side "prompt is too long" (HTTP 400) errors. Changes: - Remove ToolCallBlock from the CachePoint-only exclusion so it is included in the blocks list. - Add counting branches for ToolCallBlock (tool_name + kwargs serialized), ThinkingBlock (num_tokens if known, else content text), CitableBlock (title + source + inner content), and CitationBlock (title + source + cited_content). Fixes run-llama#21950
Contributor
|
hi! please see #21951 it has the fix already :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Memory._estimate_token_count()excludedToolCallBlockfrom the scanned blocks entirely (it was filtered out alongsideCachePoint), and had no counting branch forThinkingBlock,CitableBlock, orCitationBlock— even though these three types were admitted into theblockslist.For tool-using agents with substantial
tool_kwargs, the FIFO queue accumulates far more real tokens thanMemorybelieves, leaving the history well abovetoken_limitand eventually surfacing as provider-side "prompt is too long" (HTTP 400) errors.Fixes #21950
Changes
llama-index-core/llama_index/core/memory/memory.py:ToolCallBlockfrom theCachePoint-only exclusion so it enters the blocks list.ToolCallBlock: tokenizetool_name + str(tool_kwargs).ThinkingBlock: usenum_tokenswhen available, else tokenizecontent.CitableBlock: tokenizetitle + source, then recurse into innercontentblocks.CitationBlock: tokenizetitle + source, then countcited_content.