Bug: Vertex AI incorrectly rejecting ~2MB requests with 413 '32MB limit exceeded' error

# Bug Report: Vertex AI incorrectly rejecting requests well under 32MB limit with 413 error

## Summary
Vertex AI is incorrectly rejecting requests with 413 "Request exceeds the maximum allowed number of bytes. The maximum request size is 32 MB" errors when the actual request size is only ~2MB, well under the documented 32MB limit.

## Environment
- **SDK Version**: anthropic-sdk-python (latest)
- **Model**: claude-sonnet-4@20250514
- **API**: Vertex AI (not direct Anthropic API)
- **Region**: us-east5-aiplatform.googleapis.com, asia-east1-aiplatform.googleapis.com
- **Platform**: Python 3.11, macOS

## Bug Description

### Issue
When sending requests to Claude Sonnet 4 via Vertex AI with document payloads of ~800KB - 2MB, we consistently get:

```
Error code: 413 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Prompt is too long'}, 'request_id': 'req_vrtx_...'}
```

### Evidence This is a Bug
1. **Request size is well under limit**: Our payloads are ~2MB, far below the documented 32MB limit
2. **Inconsistent behavior**: Similar payloads work fine with prompt caching enabled
3. **Rate limit test works**: We successfully processed 376,926 cached tokens without issues
4. **Direct API comparison needed**: This appears to be Vertex AI specific

### Actual vs Expected Behavior

**Expected**: Requests under 32MB should be accepted
**Actual**: Requests as small as 2MB are rejected with 413 errors

## Reproduction Details

### Working Case (Cache Rate Limit Test)
```python
# This works fine - 15 documents, ~400KB total
documents = load_test_documents(15)
response = client.messages.create(
    model="claude-sonnet-4@20250514",
    max_tokens=100,
    temperature=0,
    system=system_prompt,
    messages=[{"role": "user", "content": content_blocks}]
)
# Result: 100% success, 376,926 tokens from cache
```

### Failing Case (Direct Document Analysis)
```python
# This fails with 413 - 32 documents, ~2MB total
documents = load_documents(32)  # ~800KB-2MB payload
response = client.messages.create(
    model="claude-sonnet-4@20250514",
    max_tokens=8000,
    temperature=0,
    system=system_prompt,
    messages=[{"role": "user", "content": content_blocks}]
)
# Result: 413 "Request exceeds 32MB limit" - but payload is only ~2MB!
```

### Test Results
From our payload progression test:

| Document Count | Payload Size | Result |
|---|---|---|
| 5 docs | ~147KB | ❌ 413 Error |
| 10 docs | ~272KB | ❌ 413 Error |
| 15 docs | ~384KB | ❌ 413 Error |
| 32 docs | ~808KB | ❌ 413 Error |

**All payloads are well under 32MB but getting rejected**

## Expected Fix
Vertex AI should accept requests up to the documented 32MB limit, not reject requests at ~2MB.

## Impact
This bug prevents legitimate use of Claude Sonnet 4's large context capabilities via Vertex AI, forcing workarounds or fallback to direct Anthropic API.

## Workaround
Currently using progressive caching strategies, but this shouldn't be necessary for payloads well under the documented limits.

## Additional Context
- Progressive caching works (suggests the issue is with direct document analysis)
- Same code works on direct Anthropic API (suggests Vertex AI specific issue)
- Error message claims 32MB limit but rejects much smaller payloads
- Consistent across multiple regions (us-east5, asia-east1)

## Request
Please investigate Vertex AI's payload size validation logic and fix the incorrect rejection of requests well under the 32MB limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Vertex AI incorrectly rejecting ~2MB requests with 413 '32MB limit exceeded' error #1028

Bug Report: Vertex AI incorrectly rejecting requests well under 32MB limit with 413 error

Summary

Environment

Bug Description

Issue

Evidence This is a Bug

Actual vs Expected Behavior

Reproduction Details

Working Case (Cache Rate Limit Test)

Failing Case (Direct Document Analysis)

Test Results

Expected Fix

Impact

Workaround

Additional Context

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document Count	Payload Size	Result
5 docs	~147KB	❌ 413 Error
10 docs	~272KB	❌ 413 Error
15 docs	~384KB	❌ 413 Error
32 docs	~808KB	❌ 413 Error

Bug: Vertex AI incorrectly rejecting ~2MB requests with 413 '32MB limit exceeded' error #1028

Description

Bug Report: Vertex AI incorrectly rejecting requests well under 32MB limit with 413 error

Summary

Environment

Bug Description

Issue

Evidence This is a Bug

Actual vs Expected Behavior

Reproduction Details

Working Case (Cache Rate Limit Test)

Failing Case (Direct Document Analysis)

Test Results

Expected Fix

Impact

Workaround

Additional Context

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions