Skip to content

fix: use per-model output token limits for Anthropic agent provider#5040

Open
elevatingcreativity wants to merge 1 commit intoMintplex-Labs:masterfrom
elevatingcreativity:fix/anthropic-agent-max-tokens
Open

fix: use per-model output token limits for Anthropic agent provider#5040
elevatingcreativity wants to merge 1 commit intoMintplex-Labs:masterfrom
elevatingcreativity:fix/anthropic-agent-max-tokens

Conversation

@elevatingcreativity
Copy link
Contributor

@elevatingcreativity elevatingcreativity commented Feb 21, 2026

Closes #5039

Bug

The Anthropic agent provider hardcodes max_tokens: 4096 for all API calls. Modern Claude models (Claude 3.5 and later) support output limits well above 4096 tokens, but the hardcoded value was set when older models had smaller limits. As a result, agent responses are silently cut off at 4096 tokens regardless of which model is selected.

Location: server/utils/agents/aibitat/providers/anthropic.jsmax_tokens: 4096 is hardcoded in both the stream() and complete() methods.

Fix

Replaces the hardcoded max_tokens: 4096 with a MODEL_MAX_OUTPUT_TOKENS lookup table mapping each Anthropic model to its actual API-enforced output token limit. Claude 3.5 models use 8192, Claude 3.7 uses 64,000, legacy models retain 4096, and any unknown or future models fall back to 4096 safely.

This also prevents 400 API errors that would occur if a hardcoded value exceeded an older model's output limit — the Anthropic API rejects requests where max_tokens exceeds the model's maximum rather than silently clamping it.

Test plan

  • Added Jest tests covering all model tiers (legacy, Claude 3.5, Claude 3.7, unknown)
  • Verified the correct max_tokens value is passed in actual API calls via mock
  • Run: jest server/__tests__/utils/agents/aibitat/providers/anthropic.test.js

🤖 Generated with Claude Code

Replaces hardcoded max_tokens: 4096 in the Anthropic agent provider with
a MODEL_MAX_OUTPUT_TOKENS lookup table mapping each model to its actual
API-enforced output token limit. Claude 3.5 models use 8192, Claude 3.7
uses 64000, legacy models retain 4096, and unknown models fall back to
4096 safely.

Fixes truncation of agent responses on modern Anthropic models. Also
prevents 400 API errors that would occur if a hardcoded value exceeded
an older model's output limit.

Adds Jest tests covering all model tiers and verifying the correct
max_tokens value is sent in API calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@elevatingcreativity
Copy link
Contributor Author

elevatingcreativity commented Mar 5, 2026

Hi @timothycarambat I would like to bump this one, I didn't see it pulled in the latest release. It's a simple but important fix. I run into truncated output from Claude within AnythingLLM on many instances (several times today). It would be great to see this fix in the main branch of the code. Thanks for your consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Anthropic agent responses truncate at 4096 tokens on modern models

1 participant