Skip to content

Make default max_tokens configurable via environment variable #227

@dcox761

Description

@dcox761

The default max_tokens value (2048) is hardcoded in the schema. Different deployment scenarios need different defaults — for example, reasoning models work better with higher limits, while cost-sensitive deployments may want a lower default.

RooCode has a setting for Maximum Output Tokens. This RooCodeInc/Roo-Code#4036 suggests it was fixed in Jun 2025 but the headers are not passed through.

Additionally, the OpenAI API now has both max_tokens and max_completion_tokens parameters, with max_completion_tokens being preferred. The current code has duplicate logic for handling these two fields.

Proposed solution

  1. Add DEFAULT_MAX_TOKENS environment variable in setting.py (default: 2048 to preserve existing behaviour)
  2. Use it as the schema default for max_tokens
  3. Compute effective_max_tokens in _parse_request that prefers max_completion_tokens over max_tokens, eliminating duplicate logic:
effective_max_tokens = (
    chat_request.max_completion_tokens
    if chat_request.max_completion_tokens is not None
    else chat_request.max_tokens
)
inference_config = {"maxTokens": effective_max_tokens}
# Example: raise default for reasoning models
export DEFAULT_MAX_TOKENS=16384

Files: src/api/setting.py, src/api/schema.py, src/api/models/bedrock.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions