Free AI Gateway

OpenAI-compatible API gateway that routes requests across free LLM providers with health-aware model selection, capability-based filtering, and aggregated analytics. Powered by SaaS Maker.

Deployment & External Services

Concern	Service
Hosting	Cloudflare Workers (`free-ai-gateway`) — deployed via `wrangler deploy`
Database	Cloudflare D1 (`free-ai-gateway-db`) — anonymous aggregate analytics
State / Caching	Cloudflare Durable Objects (`HealthStateDO`, `IpRateLimitDO`, `NeuronBudgetDO`); Cloudflare KV (`HEALTH_KV`)
Marketing / docs site	Astro + Starlight (`site/`), built into `site/dist` and served by the Worker via the `ASSETS` binding — no separate deploy
AI	Cloudflare Workers AI (`AI` binding) plus upstream provider free tiers (Groq, Gemini, OpenRouter, Cerebras, SambaNova, NVIDIA, Voyage)
CI/CD	GitHub Actions (`.github/workflows/cloudflare-deploy.yml`) — auto-deploy on push to `main`

Architecture & Toolkit

This gateway is built to be hyper-scalable, stateless, and 100% free using the Cloudflare ecosystem:

Compute: Cloudflare Workers (Hono + TypeScript + Zod)
Analytics: Cloudflare D1 (Serverless SQLite) — aggregates strictly anonymous counters (no individual request logs are stored).
State Management: Cloudflare Durable Objects track rate limiting and rolling model health (success rates, latencies).
Caching: Cloudflare KV stores fast, ephemeral health snapshots to keep edge routing instant.

Cloudflare Cost Posture

The committed Cloudflare config is free-first. Workers AI is enabled only as a fallback behind non-Cloudflare providers and is capped by NEURON_BUDGET at 9,500 neurons/day. Workers Logs sampling is off, CPU is capped to the Workers Free limit, and the unused Cloudflare Rate Limiting binding is not configured. Run pnpm audit:cloudflare-costs before deployment prep; pnpm check runs it automatically. See docs/cloudflare-cost-guardrails.md for the current guardrails.

Authentication & Project ID

Production gateway requests require a Bearer token:

Authorization: Bearer <GATEWAY_API_KEY>

All completion, response, embedding, image, video, and audio requests also require a project_id so analytics and rate accounting stay isolated by app.

You can provide the project ID in one of two ways:

As a field in the JSON body: "project_id": "my_project_123"
As a header: X-Gateway-Project-Id: my_project_123

Public health and model listing endpoints do not require a token. Data-generating /v1/* endpoints fail closed with 401 when GATEWAY_API_KEY is missing or invalid.

Operators can keep the legacy GATEWAY_API_KEY secret and add more accepted keys through GATEWAY_API_KEY_HASHES, a comma- or newline-separated list of SHA-256 hex digests. Entries may be labeled as label:sha256hex for rotation notes.

Use the operator key-ring workflow in docs/gateway-key-ops.md for new keys. The helper stores only labels and SHA-256 hashes locally; it never writes plaintext keys to the repo.

Chat Models

Use model: "auto" to let the gateway pick the best available model, or specify an exact model ID.

Capability Legend

TC = Tool/function calling
JM = JSON mode (structured output)
V = Vision (image input)
CTX = Context window (tokens)
MOT = Max output tokens

Workers AI (free via Cloudflare)

Model ID	Actual Model	Tier	CTX	MOT	Daily Limit
`workers-ai-llama-3.3-70b`	`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	high	131k	4k	200
`workers-ai-deepseek-r1-32b`	`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`	high	32k	4k	200
`workers-ai-qwen-14b`	`@cf/qwen/qwen1.5-14b-chat-awq`	medium	8k	2k	300
`workers-ai-llama-8b`	`@cf/meta/llama-3.1-8b-instruct`	medium	131k	4k	500
`workers-ai-gemma-7b`	`@cf/google/gemma-7b-it-lora`	medium	8k	2k	500
`workers-ai-mistral-7b`	`@cf/mistral/mistral-7b-instruct-v0.1`	low	8k	2k	500
`workers-ai-llama-3b`	`@cf/meta/llama-3.2-3b-instruct`	low	131k	4k	800
`workers-ai-llama-1b`	`@cf/meta/llama-3.2-1b-instruct`	low	131k	4k	1000
`workers-ai-phi-2`	`@cf/microsoft/phi-2`	low	2k	1k	800

Groq (free tier)

Model ID	Actual Model	Tier	TC	JM	V	CTX	MOT	Daily Limit
`groq-llama-70b`	`llama-3.3-70b-versatile`	high	Y	Y		131k	32k	300
`groq-gpt-oss-120b`	`openai/gpt-oss-120b`	high	Y	Y		32k	8k	200
`groq-kimi-k2`	`moonshotai/kimi-k2-instruct`	high	Y	Y		131k	8k	300
`groq-qwen3-32b`	`qwen/qwen3-32b`	high	Y	Y		32k	8k	500
`groq-gpt-oss-20b`	`openai/gpt-oss-20b`	medium	Y	Y		32k	8k	500
`groq-llama4-maverick`	`meta-llama/llama-4-maverick-17b-128e-instruct`	medium	Y	Y	Y	131k	8k	500
`groq-llama4-scout`	`meta-llama/llama-4-scout-17b-16e-instruct`	medium	Y	Y	Y	131k	8k	500
`groq-llama-8b`	`llama-3.1-8b-instant`	low	Y	Y		131k	8k	1500

Gemini (free tier)

Model ID	Actual Model	Tier	TC	JM	V	CTX	MOT	Daily Limit
`gemini-2.5-pro`	`gemini-2.5-pro`	high	Y	Y	Y	1M	64k	50
`gemini-2.5-flash`	`gemini-2.5-flash`	high	Y	Y	Y	1M	64k	500
`gemini-2.0-flash`	`gemini-2.0-flash`	medium	Y	Y	Y	1M	8k	1000
`gemini-2.0-flash-lite`	`gemini-2.0-flash-lite`	low	Y	Y	Y	1M	8k	1500
`gemini-2.5-flash-lite`	`gemini-2.5-flash-lite`	low	Y	Y	Y	1M	8k	1500

OpenRouter (needs OPENROUTER_API_KEY)

Model ID	Actual Model	Tier	TC	JM	V	CTX	MOT	Daily Limit
`openrouter-hermes-405b-free`	`nousresearch/hermes-3-llama-3.1-405b:free`	high	Y	Y		131k	4k	50
`openrouter-llama-70b-free`	`meta-llama/llama-3.3-70b-instruct:free`	high	Y	Y		131k	4k	50
`openrouter-gpt-oss-120b-free`	`openai/gpt-oss-120b:free`	high	Y	Y		32k	8k	50
`openrouter-qwen3-next-80b-free`	`qwen/qwen3-next-80b-a3b-instruct:free`	high	Y	Y		32k	8k	50
`openrouter-qwen3-coder-free`	`qwen/qwen3-coder:free`	high	Y	Y		131k	8k	50
`openrouter-mistral-small-24b-free`	`mistralai/mistral-small-3.1-24b-instruct:free`	medium	Y	Y	Y	32k	8k	100
`openrouter-gemma3-27b-free`	`google/gemma-3-27b-it:free`	medium		Y	Y	131k	8k	100
`openrouter-stepfun-flash-free`	`stepfun/step-3.5-flash:free`	medium		Y		32k	8k	100
`openrouter-gemma3-12b-free`	`google/gemma-3-12b-it:free`	medium		Y	Y	131k	8k	100
`openrouter-nvidia-nemotron-12b-free`	`nvidia/nemotron-nano-12b-v2-vl:free`	medium		Y	Y	32k	4k	100

Cerebras (needs CEREBRAS_API_KEY)

Model ID	Actual Model	Tier	TC	JM	V	CTX	MOT	Daily Limit
`cerebras-gpt-oss-120b`	`gpt-oss-120b`	high	Y	Y		32k	8k	300
`cerebras-llama-8b`	`llama3.1-8b`	low	Y	Y		131k	8k	1000

SambaNova (free tier, needs SAMBANOVA_API_KEY)

Model ID	Actual Model	Tier	TC	JM	CTX	MOT	Daily Limit
`sambanova-llama-70b`	`Meta-Llama-3.3-70B-Instruct`	high	Y	Y	131k	8k	500
`sambanova-deepseek-v3`	`DeepSeek-V3-0324`	high	Y	Y	131k	8k	300
`sambanova-qwen3-32b`	`Qwen3-32B`	high	Y	Y	32k	8k	500

NVIDIA NIM (free tier, needs NVIDIA_API_KEY)

Model ID	Actual Model	Tier	TC	JM	CTX	MOT	Daily Limit
`nvidia-llama-70b`	`meta/llama-3.3-70b-instruct`	high	Y	Y	131k	8k	500
`nvidia-deepseek-r1`	`deepseek-ai/deepseek-r1`	high		Y	131k	8k	300
`nvidia-qwen-32b`	`qwen/qwen3-32b`	high	Y	Y	32k	8k	500

Agentic Use (Tool Calling & Structured Output)

The gateway automatically routes requests to capable models when you use agentic features:

Send tools → gateway only picks models with tool calling support
Send response_format: { type: "json_object" } → gateway only picks models with JSON mode
If no capable model is available, the gateway returns a 503 rather than silently failing

This makes the gateway compatible with agent frameworks like LangChain, CrewAI, Vercel AI SDK, etc.

Tool Calling Example

curl $GATEWAY_URL/v1/chat/completions \
  -H "Authorization: Bearer $GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "demo_project",
    "messages": [{"role": "user", "content": "What is the weather in SF?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

JSON Mode Example

curl $GATEWAY_URL/v1/chat/completions \
  -H "Authorization: Bearer $GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "demo_project",
    "messages": [{"role": "user", "content": "List 3 colors as JSON"}],
    "response_format": { "type": "json_object" }
  }'

Embedding Models (6 models, 3 providers)

Embeddings require an explicit model — auto is not supported.

Model ID	Provider	Notes
`gemini-embedding-001`	Gemini	Default, highest priority
`voyage-3.5-lite`	Voyage AI	Fallback #1
`voyage-3-lite`	Voyage AI	Fallback #2
`@cf/baai/bge-large-en-v1.5`	Workers AI	768-dim, largest
`@cf/baai/bge-base-en-v1.5`	Workers AI	768-dim, balanced
`@cf/baai/bge-small-en-v1.5`	Workers AI	384-dim, fastest

Aliases — these map to gemini-embedding-001:

text-embedding-3-small
text-embedding-3-large
text-embedding-004

Quick Health Check

Before making your first request, verify the gateway is live and see which providers are healthy right now — no auth required:

curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/routing/status

A healthy response lists providers with their latency, headroom, and cooldown status. Any provider with "degraded": false is ready to route to. If all providers show "degraded": true, check back in a few minutes — the gateway auto-recovers as provider rate limits reset.

What to expect on the free tier

This gateway aggregates each provider's free tier — it is best-effort, not an SLA. Plan for the following:

Per-IP rate limit: ~10 requests burst, ~20 requests/minute sustained. Bursting harder will get you 429s — back off and retry.
Per-model daily caps: each model has a daily request budget (see the "Daily Limit" column in the model tables below). The gateway tracks remaining headroom and skips models that are out of quota.
Provider hiccups are normal: upstream free tiers throttle, time out, or briefly disappear. The gateway transparently retries across providers; the only time you see a 503 is when every capable model is degraded — usually clears in a few minutes as windows reset.
No "exactly this model" guarantee with model: "auto": the gateway routes to the best healthy model that matches your required capabilities (tools, JSON, vision, context). Pin a specific model ID if you need determinism.
If you need higher throughput or SLAs: bring your own paid provider keys and call them directly — the free tiers are intentionally capped to keep the gateway free for everyone.

API Endpoints

Base URL: https://free-ai-gateway.sarthakagrawal927.workers.dev

Chat Completions

POST /v1/chat/completions

curl $GATEWAY_URL/v1/chat/completions \
  -H "Authorization: Bearer $GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "demo_project",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false
  }'

Equivalent with the OpenAI SDK (TypeScript/JavaScript):

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.GATEWAY_API_KEY,
  baseURL: 'https://free-ai-gateway.sarthakagrawal927.workers.dev/v1',
});

const response = await client.chat.completions.create({
  model: 'auto',
  extra_body: { project_id: 'my_project' },
  messages: [{ role: 'user', content: 'Hello' }],
});
console.log(response.choices[0].message.content);

Equivalent with the OpenAI SDK (Python):

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["GATEWAY_API_KEY"],
    base_url="https://free-ai-gateway.sarthakagrawal927.workers.dev/v1",
)

response = client.chat.completions.create(
    model="auto",
    extra_body={"project_id": "my_project"},
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Field	Type	Required	Description
`model`	string	No	Model ID or `auto` (default)
`messages`	array	Yes*	OpenAI-format messages
`prompt`	string	Yes*	Shorthand when `messages` is omitted
`stream`	boolean	No	Enable SSE streaming (default false)
`temperature`	number	No	0–2
`max_tokens`	number	No	1–8192
`reasoning_effort`	string	No	`auto`, `low`, `medium`, `high`
`tools`	array	No	OpenAI-format tool definitions
`tool_choice`	string/object	No	`none`, `auto`, `required`, or `{type: "function", function: {name: "..."}}`
`response_format`	object	No	`{type: "text"}` or `{type: "json_object"}`

*Either messages or prompt is required.

Responses API

POST /v1/responses

OpenAI Responses API compatible. Non-streaming only. Internally proxies to /v1/chat/completions.

Embeddings

POST /v1/embeddings

curl $GATEWAY_URL/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-embedding-001",
    "project_id": "demo_project",
    "input": ["text to embed"]
  }'

Models

GET /v1/models

Lists all available models with health status and routing metadata. Browser requests render the searchable model catalog; API clients still receive JSON.

Routing Status

GET /v1/routing/status

Returns the current text-model fallback order with provider, latency, headroom, cooldown, and degradation reasons. This is a read-only operator endpoint for answering "which providers are usable right now, and what will the gateway try next?" without spending provider quota.

Health

GET /health

Returns model health snapshots. Browser requests render the operator health dashboard; API clients still receive JSON.

Analytics

GET /v1/analytics?project_id=<id>&days=<n>

Returns aggregate request volume and success rates broken down by provider, model, project, and day. This read-only endpoint is public so the owner dashboard at /dashboard can show all stats without a token. Token-spending chat, embedding, image, video, and audio routes still require GATEWAY_API_KEY.

Response Extensions

All responses include an x_gateway field:

{
  "x_gateway": {
    "provider": "groq",
    "model": "llama-3.3-70b-versatile",
    "attempts": 1,
    "reasoning_effort": "auto",
    "request_id": "abc123"
  }
}

SDK Usage

Works with the standard OpenAI SDK:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.GATEWAY_API_KEY,
  baseURL: 'https://free-ai-gateway.sarthakagrawal927.workers.dev/v1',
});

const response = await client.chat.completions.create({
  model: 'auto',
  // You can pass it in the body via extra_body:
  extra_body: { project_id: 'my_project' },
  messages: [{ role: 'user', content: 'Hello' }],
});

With Tool Calling (TypeScript)

const response = await client.chat.completions.create({
  model: 'auto',
  extra_body: { project_id: 'my_project' },
  messages: [{ role: 'user', content: 'What is the weather in SF?' }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: { location: { type: 'string' } },
        required: ['location'],
      },
    },
  }],
  tool_choice: 'auto',
});

Provider Routing

The gateway uses health-aware routing with capability filtering:

Tracks success rate, latency, and daily usage per model
Filters models by required capabilities (tool calling, JSON mode, vision)
Respects reasoning_effort to prefer models matching the requested tier
Automatically retries on failure with next-best model
Force a specific provider with X-Gateway-Force-Provider: groq header
Force a specific model with X-Gateway-Force-Model: llama-3.3-70b-versatile header

Command Code is part of the provider catalog when COMMAND_CODE_API_KEY is configured. Its catalog mirrors the Provider API model list, with command-code-mimo-v2-5 / xiaomi/mimo-v2.5 prioritized; force it with X-Gateway-Force-Provider: command_code.

What happens when a free provider fails?

A single failing provider does not fail your request. For each call the gateway:

Picks the best candidate model (based on success rate, latency, and remaining daily headroom).
If the upstream provider returns an error, times out, or hits its free-tier rate limit, the gateway marks that model as cooled down and immediately retries with the next-best candidate from a different provider.
The retry loop continues until a model responds successfully — or, if every capable model is degraded, the gateway returns a 503 instead of silently dropping the request.
The response includes an x_gateway.attempts counter so you can see how many providers were tried before one succeeded.

You don't need to manage per-provider keys, retries, or backoff in your app — the gateway handles all of that.

Where to check status: the public, no-auth endpoint GET /v1/routing/status returns the live fallback order with each provider's latency, headroom, cooldown, and degraded flag. If everything looks degraded, providers usually recover within a few minutes as their rate-limit windows reset.

curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/routing/status

Rate Limits

The gateway enforces two layers of limits, both designed to keep the free tier usable for everyone:

Per-IP gateway limit — token bucket: 10 requests burst, ~20 requests/minute sustained. Excess requests return 429 Too Many Requests; wait a few seconds and retry.
Per-model daily caps — each upstream provider has its own free-tier quota (see the "Daily Limit" column in the model tables above). When a model is out of headroom or returns a rate-limit error, the gateway cools it down and falls back to the next-best capable model automatically. You only see a 503 when every capable model is degraded.

Rule of thumb: treat this as a development / hobby / prototyping gateway. If you need predictable production throughput, pin a specific model and bring your own provider keys.

Development

pnpm install
cp .env.example .env  # fill provider keys
pnpm dev:local

Operator Key Management

pnpm keys:generate -- --label testing-20260603
pnpm keys:print-secret
pnpm keys:upload

pnpm keys:upload replaces the complete GATEWAY_API_KEY_HASHES secret. Seed ops/gateway-key-hashes.local.json with every existing hash before uploading.

Deploy

pnpm wrangler deploy

Active AI Task Log

This section is maintained by the SaaS Maker Active-AI product/design loop so future agents do not reopen duplicate UI tasks.

Business lane: P1 Explore
Rule: do not create another broad "improve the UI" task unless the acceptance criteria differ materially from the tasks listed here.
Source of truth for task status: SaaS Maker task board. README entries are durable context only.

Task	Status	Priority	Last known note
`2e1f9b66` free-ai: add public landing + obvious 'Get API key' CTA + quickstart	done	medium	2026-05-25 17:02:56
`fa8b0311` free-ai: add provider status proof to landing quickstart	done	medium	2026-05-26 — added health check block + routing status curl to README and getting-started docs
`95674068` free-ai: add fallback provider explanation	done	medium	2026-05-26 — added "What happens when a free provider fails?" explainer to README Provider Routing section, covering cooldown/retry behavior, `x_gateway.attempts`, and the `/v1/routing/status` check
`3d7d9a34` free-ai: add rate-limit expectation copy	done	medium	2026-05-26 — added "What to expect on the free tier" callout near Quick Health Check, expanded Rate Limits section with the two-layer model, and added a free-tier note under the getting-started "Check Provider Status" block and the landing-page quickstart card
`ac9adb19` free-ai: add curl-to-SDK quickstart bridge	done	medium	2026-05-26 — added TypeScript + Python SDK snippets after the curl Chat Completions example in README; added a "terminal / openai sdk" tab switcher to the landing-page code window in index.astro (no new dependencies)
`b26c3e9d` free-ai: add saved-cost and reliability proof strip	done	medium	2026-05-26 — added `.proof-strip` section to index.astro between hero and features: shows live free-requests count (fetched from public `/v1/stats/providers`), 8+ provider fallback chain, and estimated cost saved vs. GPT-4o mini pricing; no backend changes
`29a6cb34` free-ai: restore explicit Get API key CTA	done	medium	2026-05-27 — restored explicit 'Get API Key' CTA and icon to landing page hero and nav

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
.husky		.husky
docs		docs
e2e-live		e2e-live
e2e		e2e
examples		examples
fixtures		fixtures
migrations		migrations
ops		ops
playground		playground
scripts		scripts
site		site
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
agents.md		agents.md
design.html		design.html
eslint.config.js		eslint.config.js
foundry.json		foundry.json
package.json		package.json
playwright.config.ts		playwright.config.ts
playwright.live.config.ts		playwright.live.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
renovate.json		renovate.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
wrangler.toml		wrangler.toml

Folders and files

Latest commit

History

Repository files navigation

Free AI Gateway

Deployment & External Services

Architecture & Toolkit

Cloudflare Cost Posture

Authentication & Project ID

Chat Models

Capability Legend

Workers AI (free via Cloudflare)

Groq (free tier)

Gemini (free tier)

OpenRouter (needs OPENROUTER_API_KEY)

Cerebras (needs CEREBRAS_API_KEY)

SambaNova (free tier, needs SAMBANOVA_API_KEY)

NVIDIA NIM (free tier, needs NVIDIA_API_KEY)

Agentic Use (Tool Calling & Structured Output)

Tool Calling Example

JSON Mode Example

Embedding Models (6 models, 3 providers)

Quick Health Check

What to expect on the free tier

API Endpoints

Chat Completions

Responses API

Embeddings

Models

Routing Status

Health

Analytics

Response Extensions

SDK Usage

With Tool Calling (TypeScript)

Provider Routing

What happens when a free provider fails?

Rate Limits

Development

Operator Key Management

Deploy

Active AI Task Log

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages