Skip to content

Latest commit

 

History

History
275 lines (200 loc) · 8.46 KB

File metadata and controls

275 lines (200 loc) · 8.46 KB

Cost Estimate for 2000 Samples

⚠️ Default Configuration: Critique DISABLED

By default, critique/validation is DISABLED to save costs. The estimates below show costs with critique enabled vs disabled.

Provider Task Distribution

Default Configuration (Critique Disabled):

  • OpenAI (gpt-4o-mini): Scenario generation
  • Anthropic (claude-3-5-haiku): Response generation (civilian + first responder)
  • Gemini: Not used (critique disabled)

With Critique Enabled:

  • OpenAI (gpt-4o-mini): Scenario generation
  • Anthropic (claude-3-5-haiku): Response generation (civilian + first responder)
  • Gemini (gemini-2.0-flash-exp): Critique/validation (one per response)

API Calls Per Sample

For each sample generated:

  1. 1 scenario call (OpenAI)
  2. 2 response calls (Anthropic) - one for civilian, one for first responder (parallel)
  3. 2 critique calls (Gemini) - SKIPPED BY DEFAULT

Total for 2000 samples (default):

  • OpenAI: 2,000 calls
  • Anthropic: 4,000 calls
  • Gemini: 0 calls

Total for 2000 samples (with critique):

  • OpenAI: 2,000 calls
  • Anthropic: 4,000 calls
  • Gemini: 4,000 calls

Token Usage Estimates

1. OpenAI (gpt-4o-mini) - Scenario Generation

Per call:

  • Input: ~150 tokens (category + prompt template)
  • Output: ~75 tokens (2-4 sentence scenario)

For 2000 samples:

  • Input: 2,000 × 150 = 300,000 tokens
  • Output: 2,000 × 75 = 150,000 tokens

Pricing (as of Jan 2025):

  • Input: $0.150 per million tokens
  • Output: $0.600 per million tokens

Cost:

  • Input: 0.3M × $0.150 = $0.045
  • Output: 0.15M × $0.600 = $0.090
  • Total OpenAI: ~$0.14

2. Anthropic (claude-3-5-haiku) - Response Generation

Per call:

  • Input: ~200 tokens (role + scenario + prompt template)
  • Output: ~650 tokens (structured JSON with facts, uncertainties, analysis, guidance, confidence, quality_score)

For 4000 calls (2000 samples × 2 roles):

  • Input: 4,000 × 200 = 800,000 tokens
  • Output: 4,000 × 650 = 2,600,000 tokens

Pricing (as of Jan 2025):

  • Input: $0.800 per million tokens
  • Output: $4.00 per million tokens

Cost:

  • Input: 0.8M × $0.800 = $0.64
  • Output: 2.6M × $4.00 = $10.40
  • Total Anthropic: ~$11.04

3. Gemini (gemini-2.0-flash-exp) - Critique/Validation

⚠️ DISABLED BY DEFAULT

Per call (if enabled):

  • Input: ~300 tokens (instructions + JSON payload from response)
  • Output: ~650 tokens (corrected/validated JSON, similar size to response)

For 4000 calls (one per response):

  • Input: 4,000 × 300 = 1,200,000 tokens
  • Output: 4,000 × 650 = 2,600,000 tokens

Pricing (as of Jan 2025):

  • Input: $75 per million tokens ($0.075 per 1K tokens)
  • Output: $300 per million tokens ($0.30 per 1K tokens)

Cost (if enabled):

  • Input: 1.2M × $75 = $90.00
  • Output: 2.6M × $300 = $780.00
  • Total Gemini: ~$870.00

Total Cost Summary

Default Configuration (Critique Disabled) ⭐ RECOMMENDED

Provider Task Cost
OpenAI Scenario Generation (2K calls) ~$0.14
Anthropic Response Generation (4K calls) ~$11.04
Gemini Not used $0.00
TOTAL 2000 samples ~$11.18

With Critique Enabled

Provider Task Cost
OpenAI Scenario Generation (2K calls) ~$0.14
Anthropic Response Generation (4K calls) ~$11.04
Gemini Critique/Validation (4K calls) ~$870.00
TOTAL 2000 samples ~$881.18

Cost Breakdown by Percentage

Default (Critique Disabled):

  • Anthropic: 98.7% of total cost
  • OpenAI: 1.3% of total cost
  • Gemini: 0% (not used)

With Critique Enabled:

  • Gemini: 98.7% of total cost
  • Anthropic: 1.3% of total cost
  • OpenAI: <0.1% of total cost

Cost Optimization Suggestions

Option 1: Keep Critique Disabled (Default) ⭐ BEST SAVINGS

Current default: Critique is disabled by default, saving ~$870 (98.7% reduction).

Why it works:

  • Your response prompt already has STRICT RULES for JSON output
  • You have a SampleValidator that validates JSON after generation
  • If validation fails, the pipeline retries the entire sample (up to 3 times)
  • Modern LLMs (especially Claude Haiku) are very good at producing valid JSON

Cost: ~$11.18 for 2000 samples


Option 2: Switch Critique Provider (if you enable it)

If you want to keep critique but reduce costs, consider using a cheaper model:

  • OpenAI gpt-4o-mini: ~$1.74 for critique (vs $870 for Gemini)
  • Anthropic claude-3-5-haiku: ~$11.04 for critique (vs $870 for Gemini)

Potential savings: ~$858.96 (97% reduction)

Option 3: Conditional Critique (Smart Validation)

Only critique if initial validation fails:

  • First try: Generate response → Validate
  • If invalid: Run critique → Validate again
  • If still invalid: Retry entire sample

Potential savings: ~$650-800 (only critique ~15-30% of responses that fail validation)

Option 4: Use Cached Inputs (if available)

  • OpenAI cached inputs: $0.075 per million (50% savings)
  • Anthropic cached inputs: $0.080 per million (90% savings)

Notes

  1. Token estimates are approximate - Actual usage may vary based on:

    • Scenario complexity
    • Response length
    • Prompt variations
  2. Pricing may change - Verify current rates at:

  3. Free tiers - Check if you have any free tier credits:

    • OpenAI: Often provides $5-18 free credits for new accounts
    • Anthropic: May offer free tier credits
    • Google: May offer free tier for Gemini API
  4. Volume discounts - Some providers offer discounts for high-volume usage

Alternative Configuration: Swapped Response & Critique

If you swap the providers (with critique enabled):

  • OpenAI (gpt-4o-mini): Scenario generation + Critique/validation
  • Gemini (gemini-2.0-flash-exp): Response generation
  • Anthropic: Not used

Cost Calculation for Swapped Configuration

OpenAI - Scenario Generation (unchanged)

  • Cost: ~$0.14 (same as current)

OpenAI - Critique/Validation (4000 calls)

Per call:

  • Input: ~300 tokens (instructions + JSON payload)
  • Output: ~650 tokens (corrected JSON)

For 4000 calls:

  • Input: 4,000 × 300 = 1,200,000 tokens
  • Output: 4,000 × 650 = 2,600,000 tokens

Cost:

  • Input: 1.2M × $0.150 = $0.18
  • Output: 2.6M × $0.600 = $1.56
  • Total OpenAI Critique: ~$1.74

Gemini - Response Generation (4000 calls)

Per call:

  • Input: ~200 tokens (role + scenario + prompt template)
  • Output: ~650 tokens (structured JSON)

For 4000 calls:

  • Input: 4,000 × 200 = 800,000 tokens
  • Output: 4,000 × 650 = 2,600,000 tokens

Cost:

  • Input: 0.8M × $75 = $60.00
  • Output: 2.6M × $300 = $780.00
  • Total Gemini Response: ~$840.00

Swapped Configuration Total

Provider Task Cost
OpenAI Scenario (2K) + Critique (4K) ~$1.88
Gemini Response Generation (4K) ~$840.00
Anthropic Not used $0.00
TOTAL 2000 samples ~$841.88

Comparison: Current vs Swapped

Configuration Total Cost Savings
Default (Anthropic response, critique disabled) ~$11.18 -
With Critique (Anthropic response, Gemini critique) ~$881.18 -$870.00
Swapped (Gemini response, OpenAI critique) ~$841.88 -$830.70

Cost Comparison Summary

Configuration Total Cost Savings vs Default
Default (Anthropic response, critique disabled) ~$11.18 - ⭐
With Critique (Anthropic response, Gemini critique) ~$881.18 -$870.00
Swapped (Gemini response, OpenAI critique) ~$841.88 -$830.70
Swapped + No Critique (Gemini response, skip critique) ~$840.14 -$828.96

For 1000 Samples (Half the Cost)

Default Configuration (Critique Disabled):

  • OpenAI: ~$0.07
  • Anthropic: ~$5.52
  • Total: ~$5.59

With Critique Enabled:

  • OpenAI: ~$0.07
  • Anthropic: ~$5.52
  • Gemini: ~$435.00
  • Total: ~$440.59

Swapped Configuration (with critique):

  • OpenAI: ~$0.94 (scenario + critique)
  • Gemini: ~$420.00 (response)
  • Total: ~$420.94