Batch API Integration Guide

Overview

Claude Escalate v0.7.0 integrates Anthropic's Batch API for 50% cost reduction on non-interactive workloads. Batch requests are processed in the background (5 minutes to 24 hours) and cost 50% less than regular API calls.

Cost Comparison:

Regular API: 100 × 2500 tokens = 250,000 tokens = $0.75 (Haiku)
Batch API: 100 × 2500 × 0.5 = 125,000 tokens = $0.375
Savings: $0.375 (50%)

When to Use Batch API

✅ Good Use Cases

Bulk analysis: "Analyze all 100 files in repo for security"
Overnight jobs: Scheduled security scans, code review batches
Bulk documentation: Generate docs for all functions at once
Batch processing: Process large datasets with Claude in background
Report generation: Bulk analytics that can be generated asynchronously

Characteristics:

Non-urgent (can wait 5-24 hours for results)
Multiple similar requests (batch efficiency)
Cost-sensitive (50% savings justify delay)
Batch size: 10-100 requests per batch (configurable)

❌ Poor Use Cases

Interactive queries: User is waiting for response
Real-time code review: User needs immediate feedback
Quick lookups: Single query that's fast anyway
Production debugging: Need answer now, not tomorrow

Why not:

User experience: 5min-24h delay unacceptable
No batching opportunity (single request)
Total cost already low (<$0.01)

Configuration

Enable Batch API

# config.yaml
batch_api:
  enabled: true
  min_batch_size: 10         # Minimum requests to batch
  max_batch_size: 100        # Maximum per batch
  auto_batch_similar: true   # Group similar queries
  timeout_minutes: 30        # Auto-flush after 30 min

Auto-Detection Thresholds

Batch API automatically routes non-interactive workloads:

detector:
  enabled: true
  confidence_threshold: 0.6  # Confidence needed to batch
  min_request_count: 5       # At least 5 req/30s = bulk
  max_response_time_ms: 5000 # Timeout: can batch if >5s acceptable

Usage Examples

Example 1: Bulk File Analysis

# Regular API (interactive)
$ claude-escalate --api regular "Analyze this file for security"
# Cost: ~$0.003, latency: <2s

# Batch API (non-interactive)
$ claude-escalate --api batch "Analyze all 50 files in repo for security"
# Cost: ~$0.0015 (50% savings), latency: 5-24h
# Job ID returned immediately

Example 2: Scheduled Analysis

// Go code using SDK
ctx := context.Background()

// Submit batch of requests
batch := &escalate.BatchRequest{
    Requests: []*escalate.Request{
        {Query: "Analyze file1.go for security"},
        {Query: "Analyze file2.go for security"},
        // ... 50 files total
    },
}

jobID, err := client.SubmitBatch(ctx, batch)
// jobID returned immediately (e.g., "batch_abc123")

// Poll for results (in background job)
for {
    status, err := client.GetBatchStatus(ctx, jobID)
    if status.Completed {
        results, err := client.GetBatchResults(ctx, jobID)
        // Process results
        break
    }
    time.Sleep(30 * time.Second) // Poll every 30s
}

Example 3: Cost Comparison

Scenario: Analyze 100 files for security issues

Without Batch API:
  - 100 requests × 2500 tokens each = 250,000 tokens
  - Cost: $0.75 (Haiku pricing)
  - Latency: 200s (2s per request)

With Batch API:
  - 100 requests × 2500 × 0.5 = 125,000 tokens
  - Cost: $0.375 (50% discount)
  - Latency: 5-24 hours (background processing)

Combined with Semantic Cache:
  - Batch API: 50% savings
  - Similar requests cached: 98% savings
  - If 20 requests are similar (repeat queries):
    - Batch: 80 × 0.5 = 40 units
    - Cached: 20 × 0.02 = 0.4 units (98% saved)
    - Total: 40.4 units vs 100 baseline = 60% savings!

API Reference

SubmitBatch

Submit a batch of requests for processing:

type BatchRequest struct {
    Requests []*Request
}

type Request struct {
    Query     string
    Context   string
    Model     string // Optional, default: Sonnet
    MaxTokens int    // Optional
}

jobID, err := client.SubmitBatch(ctx, batch)
// Returns job ID for polling (e.g., "batch_abc123")
// Error if batch invalid, network error, or quota exceeded

GetBatchStatus

Check status of submitted batch:

status, err := client.GetBatchStatus(ctx, jobID)

type BatchStatus struct {
    JobID       string
    Status      string // "queued", "processing", "completed", "failed"
    SubmittedAt time.Time
    StartedAt   *time.Time
    CompletedAt *time.Time
    RequestCount int
    CompletedCount int
    FailedCount  int
}

GetBatchResults

Retrieve results from completed batch:

results, err := client.GetBatchResults(ctx, jobID)

type BatchResult struct {
    Requests  []*Request
    Responses []*Response
    Errors    map[int]error // Map of request index to error
}

CancelBatch

Cancel a submitted batch (if not yet processing):

err := client.CancelBatch(ctx, jobID)

Monitoring

Cost Tracking

Monitor batch vs regular API costs:

# View cost breakdown
$ claude-escalate metrics --cost
Batch API cost today: $1.23
Regular API cost today: $3.45
Batch savings: $2.22 (39%)

Batch Job Tracking

# List active batch jobs
$ claude-escalate batch list
batch_abc123  processing  45/100  submitted 2h ago
batch_xyz789  completed   50/50   submitted 6h ago

# Check specific job
$ claude-escalate batch status batch_abc123
Status: processing
Progress: 45 of 100 requests completed
Estimated completion: 2h 15m
Cost so far: $0.56 (50% discount applied)

Best Practices

1. Batch Similar Requests

Good:

[Request 1] Analyze file1.go
[Request 2] Analyze file2.go
[Request 3] Analyze file3.go

Bad (different models/contexts):

[Request 1] Analyze file1.go (context: security)
[Request 2] Summarize file2.go (context: documentation)
[Request 3] Find bugs in file3.go (context: code review)

Grouping similar requests in same batch = better cost optimization.

2. Use Appropriate Batch Size

Too small (5-10): Overhead of batch API not worth it, use regular API
Optimal (10-100): Good balance of cost savings and processing time
Too large (>100): May hit quotas, breaks up into multiple batches

Default: Auto-batching at 10-100 requests per batch.

3. Plan for Latency

Batch API latency: 5 minutes to 24 hours.

Don't use for real-time interactions
Schedule batch jobs for off-peak hours
Plan results delivery asynchronously (webhooks, polling, email)

4. Error Handling

// Some requests in batch may fail
results, err := client.GetBatchResults(ctx, jobID)
for i, respErr := range results.Errors {
    if respErr != nil {
        log.Warnf("Request %d failed: %v", i, respErr)
        // Retry that request with regular API
        req := results.Requests[i]
        resp, err := client.CreateMessage(ctx, req)
        // Handle response
    }
}

Troubleshooting

"Batch quota exceeded"

Too many concurrent batch requests
Wait for some batches to complete before submitting more
Check quota with: claude-escalate batch quota

"Batch timed out after 24h"

Anthropic processes batches within 24 hours
After 24h, batch is abandoned and no results returned
Resubmit batch if needed

"Batch not found"

Job ID is invalid or expired (older than 29 days)
Check job ID format: should be batch_...
Use: claude-escalate batch list to see active jobs

"Result rate limited"

Too many requests to fetch results
Wait 30-60 seconds before polling again
Use: Exponential backoff (start at 30s, increase by 1.5x each retry)

Migration from v0.6.0

Batch API is fully backward compatible. No code changes required.

To enable:

Add batch_api.enabled: true to config.yaml
Restart gateway
Batch API auto-detects non-interactive workloads

To use explicitly:

# Before
$ claude-escalate "analyze all files"  # Uses regular API

# After
$ claude-escalate --batch "analyze all files"  # Uses Batch API
# or
$ claude-escalate batch submit "analyze all files"  # Explicit batch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch API Integration Guide

Overview

When to Use Batch API

✅ Good Use Cases

❌ Poor Use Cases

Configuration

Enable Batch API

Auto-Detection Thresholds

Usage Examples

Example 1: Bulk File Analysis

Example 2: Scheduled Analysis

Example 3: Cost Comparison

API Reference

SubmitBatch

GetBatchStatus

GetBatchResults

CancelBatch

Monitoring

Cost Tracking

Batch Job Tracking

Best Practices

1. Batch Similar Requests

2. Use Appropriate Batch Size

3. Plan for Latency

4. Error Handling

Troubleshooting

"Batch quota exceeded"

"Batch timed out after 24h"

"Batch not found"

"Result rate limited"

Migration from v0.6.0

See Also

FilesExpand file tree

BATCH_API.md

Latest commit

History

BATCH_API.md

File metadata and controls

Batch API Integration Guide

Overview

When to Use Batch API

✅ Good Use Cases

❌ Poor Use Cases

Configuration

Enable Batch API

Auto-Detection Thresholds

Usage Examples

Example 1: Bulk File Analysis

Example 2: Scheduled Analysis

Example 3: Cost Comparison

API Reference

SubmitBatch

GetBatchStatus

GetBatchResults

CancelBatch

Monitoring

Cost Tracking

Batch Job Tracking

Best Practices

1. Batch Similar Requests

2. Use Appropriate Batch Size

3. Plan for Latency

4. Error Handling

Troubleshooting

"Batch quota exceeded"

"Batch timed out after 24h"

"Batch not found"

"Result rate limited"

Migration from v0.6.0

See Also