Skip to content

Bug: Vertex AI incorrectly rejecting ~2MB requests with 413 '32MB limit exceeded' error #1028

@abhiwebshar

Description

@abhiwebshar

Bug Report: Vertex AI incorrectly rejecting requests well under 32MB limit with 413 error

Summary

Vertex AI is incorrectly rejecting requests with 413 "Request exceeds the maximum allowed number of bytes. The maximum request size is 32 MB" errors when the actual request size is only ~2MB, well under the documented 32MB limit.

Environment

  • SDK Version: anthropic-sdk-python (latest)
  • Model: claude-sonnet-4@20250514
  • API: Vertex AI (not direct Anthropic API)
  • Region: us-east5-aiplatform.googleapis.com, asia-east1-aiplatform.googleapis.com
  • Platform: Python 3.11, macOS

Bug Description

Issue

When sending requests to Claude Sonnet 4 via Vertex AI with document payloads of ~800KB - 2MB, we consistently get:

Error code: 413 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Prompt is too long'}, 'request_id': 'req_vrtx_...'}

Evidence This is a Bug

  1. Request size is well under limit: Our payloads are ~2MB, far below the documented 32MB limit
  2. Inconsistent behavior: Similar payloads work fine with prompt caching enabled
  3. Rate limit test works: We successfully processed 376,926 cached tokens without issues
  4. Direct API comparison needed: This appears to be Vertex AI specific

Actual vs Expected Behavior

Expected: Requests under 32MB should be accepted
Actual: Requests as small as 2MB are rejected with 413 errors

Reproduction Details

Working Case (Cache Rate Limit Test)

# This works fine - 15 documents, ~400KB total
documents = load_test_documents(15)
response = client.messages.create(
    model="claude-sonnet-4@20250514",
    max_tokens=100,
    temperature=0,
    system=system_prompt,
    messages=[{"role": "user", "content": content_blocks}]
)
# Result: 100% success, 376,926 tokens from cache

Failing Case (Direct Document Analysis)

# This fails with 413 - 32 documents, ~2MB total
documents = load_documents(32)  # ~800KB-2MB payload
response = client.messages.create(
    model="claude-sonnet-4@20250514",
    max_tokens=8000,
    temperature=0,
    system=system_prompt,
    messages=[{"role": "user", "content": content_blocks}]
)
# Result: 413 "Request exceeds 32MB limit" - but payload is only ~2MB!

Test Results

From our payload progression test:

Document Count Payload Size Result
5 docs ~147KB ❌ 413 Error
10 docs ~272KB ❌ 413 Error
15 docs ~384KB ❌ 413 Error
32 docs ~808KB ❌ 413 Error

All payloads are well under 32MB but getting rejected

Expected Fix

Vertex AI should accept requests up to the documented 32MB limit, not reject requests at ~2MB.

Impact

This bug prevents legitimate use of Claude Sonnet 4's large context capabilities via Vertex AI, forcing workarounds or fallback to direct Anthropic API.

Workaround

Currently using progressive caching strategies, but this shouldn't be necessary for payloads well under the documented limits.

Additional Context

  • Progressive caching works (suggests the issue is with direct document analysis)
  • Same code works on direct Anthropic API (suggests Vertex AI specific issue)
  • Error message claims 32MB limit but rejects much smaller payloads
  • Consistent across multiple regions (us-east5, asia-east1)

Request

Please investigate Vertex AI's payload size validation logic and fix the incorrect rejection of requests well under the 32MB limit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions