Skip to content

Make default max_tokens configurable via environment variable #227

@dcox761

Description

@dcox761

The default max_tokens value (2048) is hardcoded in the schema. Different deployment scenarios need different defaults — for example, reasoning models work better with higher limits, while cost-sensitive deployments may want a lower default.

RooCode has a setting for Maximum Output Tokens. This RooCodeInc/Roo-Code#4036 suggests it was fixed in Jun 2025 but the headers are not passed through.

Additionally, the OpenAI API now has both max_tokens and max_completion_tokens parameters, with max_completion_tokens being preferred. The current code has duplicate logic for handling these two fields.

Proposed solution

  1. Add DEFAULT_MAX_TOKENS environment variable in setting.py (default: 2048 to preserve existing behaviour)
  2. Use it as the schema default for max_tokens
  3. Compute effective_max_tokens in _parse_request that prefers max_completion_tokens over max_tokens, eliminating duplicate logic:
effective_max_tokens = (
    chat_request.max_completion_tokens
    if chat_request.max_completion_tokens is not None
    else chat_request.max_tokens
)
inference_config = {"maxTokens": effective_max_tokens}
# Example: raise default for reasoning models
export DEFAULT_MAX_TOKENS=16384

Files: src/api/setting.py, src/api/schema.py, src/api/models/bedrock.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions