Feature/configurable max tokens by dcox761 · Pull Request #228 · aws-samples/bedrock-access-gateway

dcox761 · 2026-03-03T11:19:29Z

*Issue #227 *

Description of changes:

Add DEFAULT_MAX_TOKENS environment variable in setting.py (default: 2048 to preserve existing behaviour)
Use it as the schema default for max_tokens
Compute effective_max_tokens in _parse_request that prefers max_completion_tokens over max_tokens, eliminating duplicate logic:

effective_max_tokens = (
    chat_request.max_completion_tokens
    if chat_request.max_completion_tokens is not None
    else chat_request.max_tokens
)
inference_config = {"maxTokens": effective_max_tokens}

# Example: raise default for reasoning models
export DEFAULT_MAX_TOKENS=16384

Files: src/api/setting.py, src/api/schema.py, src/api/models/bedrock.py
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

… var Add DEFAULT_MAX_TOKENS environment variable (default: 2048, preserving existing behaviour) so operators can tune the fallback max_tokens without code changes. Changes: - setting.py: Add DEFAULT_MAX_TOKENS env var - schema.py: Import DEFAULT_MAX_TOKENS; use it as ChatRequest.max_tokens default - bedrock.py: Compute effective_max_tokens preferring max_completion_tokens (OpenAI newer field) over max_tokens (legacy), and use it consistently in inference_config and reasoning budget_tokens calculation The effective_max_tokens logic ensures that clients sending max_completion_tokens (the newer OpenAI field) are handled correctly, while the env var gives operators control over the default when neither field is specified by the client.

The comment already explains the intent (prefer max_completion_tokens over max_tokens). Naming a specific client adds no value and will become stale.

zxkane

Thanks for the PR! The refactoring to consolidate effective_max_tokens and eliminate duplicate logic is a nice improvement.

The main concern is around the default value for max_tokens. Per the Bedrock InferenceConfiguration docs, maxTokens is optional and defaults to the model's maximum when omitted. The OpenAI API also treats max_tokens as optional. Hardcoding 2048 artificially caps every model's output — for example, Claude Sonnet 4 supports 8192 and Opus 4 supports 32768 output tokens.

Rather than making the hardcoded 2048 configurable, the better fix would be to change the default to None and only include maxTokens in the Bedrock request when explicitly set by the client. This aligns with both the OpenAI API and Bedrock API behavior.

See inline comments for details.

zxkane · 2026-03-06T10:34:43Z


 DEBUG = os.environ.get("DEBUG", "false").lower() != "false"
 AWS_REGION = os.environ.get("AWS_REGION", "us-west-2")
+DEFAULT_MAX_TOKENS = int(os.environ.get("DEFAULT_MAX_TOKENS", "2048"))


A couple of concerns here:

1. Should the default really be 2048?

Per the Bedrock InferenceConfiguration docs, maxTokens is optional — when omitted, it defaults to the maximum allowed value for the model. The OpenAI API also treats max_tokens as optional (defaults to model max).

Hardcoding 2048 artificially caps every model's output. For example, Claude Sonnet 4's max output is 8192 tokens and Opus 4 supports up to 32768. Users who don't explicitly set max_tokens would get truncated responses without realizing why.

A better default would be None, which lets Bedrock use the model's native max — consistent with both the OpenAI API and the Bedrock API:

# Let Bedrock use the model's native max output tokens when not specified max_tokens: int | None = None

Then in _parse_request, only include maxTokens in inference_config when it's not None.

2. Unguarded int() conversion

If a user sets DEFAULT_MAX_TOKENS=abc or DEFAULT_MAX_TOKENS= (empty string), the application will crash at import time with an unhelpful ValueError. This is the only int() cast in the settings module. Consider wrapping with try/except and validating >= 1.

zxkane · 2026-03-06T10:34:45Z

    top_p: float | None = Field(default=None, le=1.0, ge=0.0)
    user: str | None = None  # Not used
-    max_tokens: int | None = 2048
+    max_tokens: int | None = DEFAULT_MAX_TOKENS


Related to the comment on setting.py — since maxTokens is optional in the Bedrock Converse API (defaults to model max), and max_tokens is also optional in the OpenAI API, the default here should arguably be None rather than DEFAULT_MAX_TOKENS (2048).

Also, other numeric fields in this model use Field() with validation constraints (e.g., temperature, top_p). Consider adding the same for consistency:

max_tokens: int | None = Field(default=None, ge=1) max_completion_tokens: int | None = Field(default=None, ge=1)

This would give users clear Pydantic validation errors for invalid values (e.g., max_tokens: 0 or max_tokens: -1) instead of opaque Bedrock API errors.

zxkane · 2026-03-06T10:34:46Z

+        # Prefer max_completion_tokens (OpenAI newer field) over max_tokens (legacy).
+        effective_max_tokens = (
+            chat_request.max_completion_tokens
+            if chat_request.max_completion_tokens is not None
+            else chat_request.max_tokens
+        )
        inference_config = {
-            "maxTokens": chat_request.max_tokens,
+            "maxTokens": effective_max_tokens,
        }


The refactoring to consolidate effective_max_tokens is a nice improvement. A couple of things to note:

1. Behavior change: falsy check → is not None check

The old code in the Claude reasoning block used if chat_request.max_completion_tokens (falsy — treats 0 as False). The new code uses is not None (treats 0 as a valid value). This is more correct, but it's a subtle behavior change that could surface as a regression if any client sends max_completion_tokens: 0.

2. effective_max_tokens can be None

If max_tokens defaults to None (as suggested above) and the client doesn't send max_completion_tokens either, effective_max_tokens will be None. In that case, we should omit maxTokens from inference_config entirely so Bedrock uses the model default, rather than sending maxTokens: None which would cause a ParamValidationError:

effective_max_tokens = ( chat_request.max_completion_tokens if chat_request.max_completion_tokens is not None else chat_request.max_tokens ) inference_config = {} if effective_max_tokens is not None: inference_config["maxTokens"] = effective_max_tokens

zxkane · 2026-03-06T10:34:47Z

                )
-                inference_config["maxTokens"] = max_tokens
+                inference_config["maxTokens"] = effective_max_tokens
                # unset topP - Not supported


Minor: inference_config["maxTokens"] = effective_max_tokens on line 832 is now redundant — it was already set to the same value on line 785. In the old code this re-assignment was needed because a different local max_tokens was computed here, but after your refactoring both use effective_max_tokens. Could be removed for clarity.

MoonSangJin · 2026-04-29T14:18:59Z

Hi @dcox761 and @zxkane,

It looks like this PR has been inactive for a while. Would it be okay if I pick up this work?

If no one is currently working on it, I'd like to open a new PR to address the maintainer's feedback (setting defaults to None and fixing the inference_config logic).

Thanks!

MoonSangJin · 2026-05-06T07:53:01Z

Hi @dcox761,

Just wanted to let you know that I've followed up on the maintainer's feedback (including the None defaults, inference_config logic, and documentation) in a new PR, and it has been merged in #241.

Thank you for your initial work and for laying the foundation for this fix!

bcsdcox added 2 commits March 3, 2026 14:10

Remove RooCode reference from max_completion_tokens comment

7730015

The comment already explains the intent (prefer max_completion_tokens over max_tokens). Naming a specific client adds no value and will become stale.

zxkane requested changes Mar 6, 2026

View reviewed changes

MoonSangJin mentioned this pull request Apr 29, 2026

fix: use None as default for max_tokens and prefer max_completion_tokens #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/configurable max tokens#228

Feature/configurable max tokens#228
dcox761 wants to merge 2 commits into
aws-samples:mainfrom
bluecrystalsolutions:feature/configurable-max-tokens

dcox761 commented Mar 3, 2026

Uh oh!

zxkane left a comment

Uh oh!

zxkane Mar 6, 2026

Uh oh!

zxkane Mar 6, 2026

Uh oh!

zxkane Mar 6, 2026

Uh oh!

zxkane Mar 6, 2026

Uh oh!

MoonSangJin commented Apr 29, 2026

Uh oh!

MoonSangJin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dcox761 commented Mar 3, 2026

Uh oh!

zxkane left a comment

Choose a reason for hiding this comment

Uh oh!

zxkane Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

zxkane Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

zxkane Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

zxkane Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

MoonSangJin commented Apr 29, 2026

Uh oh!

MoonSangJin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants