fix: use per-model output token limits for Anthropic agent provider#5040
Open
elevatingcreativity wants to merge 1 commit intoMintplex-Labs:masterfrom
Open
fix: use per-model output token limits for Anthropic agent provider#5040elevatingcreativity wants to merge 1 commit intoMintplex-Labs:masterfrom
elevatingcreativity wants to merge 1 commit intoMintplex-Labs:masterfrom
Conversation
Replaces hardcoded max_tokens: 4096 in the Anthropic agent provider with a MODEL_MAX_OUTPUT_TOKENS lookup table mapping each model to its actual API-enforced output token limit. Claude 3.5 models use 8192, Claude 3.7 uses 64000, legacy models retain 4096, and unknown models fall back to 4096 safely. Fixes truncation of agent responses on modern Anthropic models. Also prevents 400 API errors that would occur if a hardcoded value exceeded an older model's output limit. Adds Jest tests covering all model tiers and verifying the correct max_tokens value is sent in API calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
Hi @timothycarambat I would like to bump this one, I didn't see it pulled in the latest release. It's a simple but important fix. I run into truncated output from Claude within AnythingLLM on many instances (several times today). It would be great to see this fix in the main branch of the code. Thanks for your consideration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5039
Bug
The Anthropic agent provider hardcodes
max_tokens: 4096for all API calls. Modern Claude models (Claude 3.5 and later) support output limits well above 4096 tokens, but the hardcoded value was set when older models had smaller limits. As a result, agent responses are silently cut off at 4096 tokens regardless of which model is selected.Location:
server/utils/agents/aibitat/providers/anthropic.js—max_tokens: 4096is hardcoded in both thestream()andcomplete()methods.Fix
Replaces the hardcoded
max_tokens: 4096with aMODEL_MAX_OUTPUT_TOKENSlookup table mapping each Anthropic model to its actual API-enforced output token limit. Claude 3.5 models use 8192, Claude 3.7 uses 64,000, legacy models retain 4096, and any unknown or future models fall back to 4096 safely.This also prevents 400 API errors that would occur if a hardcoded value exceeded an older model's output limit — the Anthropic API rejects requests where
max_tokensexceeds the model's maximum rather than silently clamping it.Test plan
max_tokensvalue is passed in actual API calls via mockjest server/__tests__/utils/agents/aibitat/providers/anthropic.test.js🤖 Generated with Claude Code