-
Notifications
You must be signed in to change notification settings - Fork 412
Description
Bug Report: Vertex AI incorrectly rejecting requests well under 32MB limit with 413 error
Summary
Vertex AI is incorrectly rejecting requests with 413 "Request exceeds the maximum allowed number of bytes. The maximum request size is 32 MB" errors when the actual request size is only ~2MB, well under the documented 32MB limit.
Environment
- SDK Version: anthropic-sdk-python (latest)
- Model: claude-sonnet-4@20250514
- API: Vertex AI (not direct Anthropic API)
- Region: us-east5-aiplatform.googleapis.com, asia-east1-aiplatform.googleapis.com
- Platform: Python 3.11, macOS
Bug Description
Issue
When sending requests to Claude Sonnet 4 via Vertex AI with document payloads of ~800KB - 2MB, we consistently get:
Error code: 413 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Prompt is too long'}, 'request_id': 'req_vrtx_...'}
Evidence This is a Bug
- Request size is well under limit: Our payloads are ~2MB, far below the documented 32MB limit
- Inconsistent behavior: Similar payloads work fine with prompt caching enabled
- Rate limit test works: We successfully processed 376,926 cached tokens without issues
- Direct API comparison needed: This appears to be Vertex AI specific
Actual vs Expected Behavior
Expected: Requests under 32MB should be accepted
Actual: Requests as small as 2MB are rejected with 413 errors
Reproduction Details
Working Case (Cache Rate Limit Test)
# This works fine - 15 documents, ~400KB total
documents = load_test_documents(15)
response = client.messages.create(
model="claude-sonnet-4@20250514",
max_tokens=100,
temperature=0,
system=system_prompt,
messages=[{"role": "user", "content": content_blocks}]
)
# Result: 100% success, 376,926 tokens from cacheFailing Case (Direct Document Analysis)
# This fails with 413 - 32 documents, ~2MB total
documents = load_documents(32) # ~800KB-2MB payload
response = client.messages.create(
model="claude-sonnet-4@20250514",
max_tokens=8000,
temperature=0,
system=system_prompt,
messages=[{"role": "user", "content": content_blocks}]
)
# Result: 413 "Request exceeds 32MB limit" - but payload is only ~2MB!Test Results
From our payload progression test:
| Document Count | Payload Size | Result |
|---|---|---|
| 5 docs | ~147KB | ❌ 413 Error |
| 10 docs | ~272KB | ❌ 413 Error |
| 15 docs | ~384KB | ❌ 413 Error |
| 32 docs | ~808KB | ❌ 413 Error |
All payloads are well under 32MB but getting rejected
Expected Fix
Vertex AI should accept requests up to the documented 32MB limit, not reject requests at ~2MB.
Impact
This bug prevents legitimate use of Claude Sonnet 4's large context capabilities via Vertex AI, forcing workarounds or fallback to direct Anthropic API.
Workaround
Currently using progressive caching strategies, but this shouldn't be necessary for payloads well under the documented limits.
Additional Context
- Progressive caching works (suggests the issue is with direct document analysis)
- Same code works on direct Anthropic API (suggests Vertex AI specific issue)
- Error message claims 32MB limit but rejects much smaller payloads
- Consistent across multiple regions (us-east5, asia-east1)
Request
Please investigate Vertex AI's payload size validation logic and fix the incorrect rejection of requests well under the 32MB limit.