Claude Escalate v0.7.0 integrates Anthropic's Batch API for 50% cost reduction on non-interactive workloads. Batch requests are processed in the background (5 minutes to 24 hours) and cost 50% less than regular API calls.
Cost Comparison:
- Regular API: 100 × 2500 tokens = 250,000 tokens = $0.75 (Haiku)
- Batch API: 100 × 2500 × 0.5 = 125,000 tokens = $0.375
- Savings: $0.375 (50%)
- Bulk analysis: "Analyze all 100 files in repo for security"
- Overnight jobs: Scheduled security scans, code review batches
- Bulk documentation: Generate docs for all functions at once
- Batch processing: Process large datasets with Claude in background
- Report generation: Bulk analytics that can be generated asynchronously
Characteristics:
- Non-urgent (can wait 5-24 hours for results)
- Multiple similar requests (batch efficiency)
- Cost-sensitive (50% savings justify delay)
- Batch size: 10-100 requests per batch (configurable)
- Interactive queries: User is waiting for response
- Real-time code review: User needs immediate feedback
- Quick lookups: Single query that's fast anyway
- Production debugging: Need answer now, not tomorrow
Why not:
- User experience: 5min-24h delay unacceptable
- No batching opportunity (single request)
- Total cost already low (<$0.01)
# config.yaml
batch_api:
enabled: true
min_batch_size: 10 # Minimum requests to batch
max_batch_size: 100 # Maximum per batch
auto_batch_similar: true # Group similar queries
timeout_minutes: 30 # Auto-flush after 30 minBatch API automatically routes non-interactive workloads:
detector:
enabled: true
confidence_threshold: 0.6 # Confidence needed to batch
min_request_count: 5 # At least 5 req/30s = bulk
max_response_time_ms: 5000 # Timeout: can batch if >5s acceptable# Regular API (interactive)
$ claude-escalate --api regular "Analyze this file for security"
# Cost: ~$0.003, latency: <2s
# Batch API (non-interactive)
$ claude-escalate --api batch "Analyze all 50 files in repo for security"
# Cost: ~$0.0015 (50% savings), latency: 5-24h
# Job ID returned immediately// Go code using SDK
ctx := context.Background()
// Submit batch of requests
batch := &escalate.BatchRequest{
Requests: []*escalate.Request{
{Query: "Analyze file1.go for security"},
{Query: "Analyze file2.go for security"},
// ... 50 files total
},
}
jobID, err := client.SubmitBatch(ctx, batch)
// jobID returned immediately (e.g., "batch_abc123")
// Poll for results (in background job)
for {
status, err := client.GetBatchStatus(ctx, jobID)
if status.Completed {
results, err := client.GetBatchResults(ctx, jobID)
// Process results
break
}
time.Sleep(30 * time.Second) // Poll every 30s
}Scenario: Analyze 100 files for security issues
Without Batch API:
- 100 requests × 2500 tokens each = 250,000 tokens
- Cost: $0.75 (Haiku pricing)
- Latency: 200s (2s per request)
With Batch API:
- 100 requests × 2500 × 0.5 = 125,000 tokens
- Cost: $0.375 (50% discount)
- Latency: 5-24 hours (background processing)
Combined with Semantic Cache:
- Batch API: 50% savings
- Similar requests cached: 98% savings
- If 20 requests are similar (repeat queries):
- Batch: 80 × 0.5 = 40 units
- Cached: 20 × 0.02 = 0.4 units (98% saved)
- Total: 40.4 units vs 100 baseline = 60% savings!
Submit a batch of requests for processing:
type BatchRequest struct {
Requests []*Request
}
type Request struct {
Query string
Context string
Model string // Optional, default: Sonnet
MaxTokens int // Optional
}
jobID, err := client.SubmitBatch(ctx, batch)
// Returns job ID for polling (e.g., "batch_abc123")
// Error if batch invalid, network error, or quota exceededCheck status of submitted batch:
status, err := client.GetBatchStatus(ctx, jobID)
type BatchStatus struct {
JobID string
Status string // "queued", "processing", "completed", "failed"
SubmittedAt time.Time
StartedAt *time.Time
CompletedAt *time.Time
RequestCount int
CompletedCount int
FailedCount int
}Retrieve results from completed batch:
results, err := client.GetBatchResults(ctx, jobID)
type BatchResult struct {
Requests []*Request
Responses []*Response
Errors map[int]error // Map of request index to error
}Cancel a submitted batch (if not yet processing):
err := client.CancelBatch(ctx, jobID)Monitor batch vs regular API costs:
# View cost breakdown
$ claude-escalate metrics --cost
Batch API cost today: $1.23
Regular API cost today: $3.45
Batch savings: $2.22 (39%)# List active batch jobs
$ claude-escalate batch list
batch_abc123 processing 45/100 submitted 2h ago
batch_xyz789 completed 50/50 submitted 6h ago
# Check specific job
$ claude-escalate batch status batch_abc123
Status: processing
Progress: 45 of 100 requests completed
Estimated completion: 2h 15m
Cost so far: $0.56 (50% discount applied)Good:
[Request 1] Analyze file1.go
[Request 2] Analyze file2.go
[Request 3] Analyze file3.go
Bad (different models/contexts):
[Request 1] Analyze file1.go (context: security)
[Request 2] Summarize file2.go (context: documentation)
[Request 3] Find bugs in file3.go (context: code review)
Grouping similar requests in same batch = better cost optimization.
- Too small (5-10): Overhead of batch API not worth it, use regular API
- Optimal (10-100): Good balance of cost savings and processing time
- Too large (>100): May hit quotas, breaks up into multiple batches
Default: Auto-batching at 10-100 requests per batch.
Batch API latency: 5 minutes to 24 hours.
- Don't use for real-time interactions
- Schedule batch jobs for off-peak hours
- Plan results delivery asynchronously (webhooks, polling, email)
// Some requests in batch may fail
results, err := client.GetBatchResults(ctx, jobID)
for i, respErr := range results.Errors {
if respErr != nil {
log.Warnf("Request %d failed: %v", i, respErr)
// Retry that request with regular API
req := results.Requests[i]
resp, err := client.CreateMessage(ctx, req)
// Handle response
}
}- Too many concurrent batch requests
- Wait for some batches to complete before submitting more
- Check quota with:
claude-escalate batch quota
- Anthropic processes batches within 24 hours
- After 24h, batch is abandoned and no results returned
- Resubmit batch if needed
- Job ID is invalid or expired (older than 29 days)
- Check job ID format: should be
batch_... - Use:
claude-escalate batch listto see active jobs
- Too many requests to fetch results
- Wait 30-60 seconds before polling again
- Use: Exponential backoff (start at 30s, increase by 1.5x each retry)
Batch API is fully backward compatible. No code changes required.
To enable:
- Add
batch_api.enabled: trueto config.yaml - Restart gateway
- Batch API auto-detects non-interactive workloads
To use explicitly:
# Before
$ claude-escalate "analyze all files" # Uses regular API
# After
$ claude-escalate --batch "analyze all files" # Uses Batch API
# or
$ claude-escalate batch submit "analyze all files" # Explicit batch- CHANGELOG.md — v0.7.0 release notes
- API.md — Full gateway API reference
- ARCHITECTURE.md — How optimization layers work together