This gateway should run without surprise Cloudflare charges. The committed config is intentionally free-first:
- Workers AI is enabled with
WORKERS_AI_ENABLED = "true", but automatic routing keeps it behind non-Cloudflare providers so it is a fallback source of compute. - Every Workers AI call path must pass through
NEURON_BUDGET, which caps usage at 9,500 Neurons/day, 500 below Cloudflare's 10,000 Neurons/day free allocation. - Text and embedding debit estimates use Cloudflare's published per-model token pricing converted back into approximate neurons, plus a 20% buffer. Models without explicit pricing stay on fixed conservative estimates.
- Workers Logs/observability sampling is disabled in committed config because Workers Logs can create paid overage on Workers Paid plans.
- Worker CPU is capped at 10ms in committed config, matching the Workers Free per-invocation CPU limit.
- The unused Cloudflare Rate Limiting binding is not configured; request throttling uses
IpRateLimitDO.
Run the local guard before deployment prep:
pnpm audit:cloudflare-costspnpm check also runs this audit before typecheck and unit tests.
Do not enable Workers Logs, higher CPU limits, paid-plan-only bindings, or a Workers AI neuron cap above 9,500/day in committed config unless the task explicitly approves paid Cloudflare usage and records the expected monthly ceiling.
Reference points checked on 2026-05-09:
- Workers AI free allocation: 10,000 Neurons/day.
- Workers Free request/CPU posture: 100,000 requests/day and 10ms CPU/invocation.
- Workers Logs paid overage can apply on Workers Paid plans.
- D1, KV, and SQLite-backed Durable Objects have Free plan quotas that fail closed when exceeded on Free plans.