Plug Claude Code, Codex, Cursor, Cline, Copilot & Antigravity into FREE Claude / GPT / Gemini. Auto-fallback.
RTK + Caveman compression saves 15–95% tokens. Never hit limits.
~1.6B documented free tokens/month — up to ~2.1B in your first month with signup credits — aggregated across the free tiers, plus a long tail of permanently-free, no-cap providers, and the compression above stretches every one further. (how we count →)
Questions, provider tips, roadmap & support → Discord · Telegram · WhatsApp 🌍 Global / 🇧🇷 Brasil
🚀 Quick Start • 🎯 Combos • 🌐 Providers • 🔌 CLI & MCP • 🗜️ Compression • 🌍 Website
💥 The Promise • 🤔 Why • 🏆 What Sets Apart • 🤖 Compatible CLIs • 🖥️ Where It Runs • 🔒 Private • 🎬 In Action • 📚 Explore More • 📧 Support
| 🇺🇸 | 🇧🇷 | 🇪🇸 | 🇫🇷 | 🇮🇹 | 🇷🇺 | 🇨🇳 | 🇩🇪 | 🇯🇵 | 🇰🇷 | 🇮🇳 |
| 🇹🇭 | 🇻🇳 | 🇮🇩 | 🇲🇾 | 🇵🇭 | 🇸🇦 | 🇮🇱 | 🇦🇿 | 🇺🇦 | 🇵🇱 | 🇨🇿 |
| 🇳🇱 | 🇧🇬 | 🇩🇰 | 🇫🇮 | 🇳🇴 | 🇸🇪 | 🇭🇺 | 🇷🇴 | 🇸🇰 | 🇵🇹 |
Stacking free tiers by hand is painful — dozens of SDKs, dozens of rate limits, and no idea how much you actually have. OmniRoute aggregates the documented free tiers of 40+ provider pools / 500+ models into one honest number and shows it live on the dashboard (
/dashboard/free-tiers).
- ~1.6B free tokens / month (steady) — and up to ~2.1B in your first month with signup credits.
- Pool-deduped, honest — we count each shared free pool once, so the headline isn't inflated by rate-limit ceilings the way multi-billion competitor claims are. (Counting every rate limit 24/7 would read ~10B; we don't publish that.)
- Plus the un-countable — permanently-free, no-token-cap providers (SiliconFlow, Z.AI GLM-Flash, Kilo, OpenCode Zen…) and a $10 OpenRouter top-up that unlocks +24M/mo, both surfaced separately so they never inflate the headline.
- Per-model breakdown, live used / remaining for the current month, and a transparent terms flag per provider.
Preview mockup — a real screenshot lands once the
/dashboard/free-tierspage is validated. Full methodology (pool dedupe, credit tiers, provider terms): docs/reference/FREE_TIERS.md.
One endpoint. 231 providers. Never stop building — and let OmniRoute pick the cheapest one that works.
| 🚫 Never hit limits Auto-fallback across 231 providers in milliseconds. Quota out? Next provider takes over — zero downtime. |
💸 Save up to 95% tokens RTK + Caveman stacked compression cuts 15–95% of eligible tokens (~89% avg on tool-heavy sessions). |
🆓 $0 to start 50+ providers with a free tier, 11 free forever (Kiro, Qoder, Pollinations, LongCat…). No card needed. |
| 🔌 Every tool works 16+ coding agents — Claude Code, Codex, Cursor, Cline, Copilot, Antigravity — through one config. |
🧩 One endpoint OpenAI ↔ Claude ↔ Gemini ↔ Responses API translation. Point any tool at /v1 and it just works. |
🛡️ Production-grade Circuit breakers, TLS stealth, MCP (87 tools), A2A, memory, guardrails, evals. 14,965 tests. |
Stop juggling 10 dashboards, dead API keys, and surprise bills.
| ❌ The daily pain | ✅ How OmniRoute fixes it |
|---|---|
| 📉 Subscription quota expires unused every month | Maximize subscriptions — track quota, use every token before reset |
| 🛑 Rate limits stop you mid-coding | 4-tier auto-fallback — Subscription → API → Cheap → Free, in milliseconds |
🔥 Tool outputs (git diff, grep, logs) burn tokens |
RTK + Caveman compression — save 15–95% eligible tokens per request |
| 💸 Expensive APIs ($20–50/mo per provider) | Cost-optimized routing — auto-route to the cheapest viable model |
| 🧰 Each AI tool wants its own setup | One endpoint, every tool, one dashboard |
| 🌍 AI blocked in your country | 3-level proxy + TLS fingerprint stealth — use AI from anywhere |
┌──────────────────────────────────────────────────────────┐
│ Your IDE / CLI (Claude Code, Cursor, Cline…) │
└─────────────────────────┬──────────────────────────────────┘
│ http://localhost:20128/v1
▼
┌──────────────────────────────────────────────────────────┐
│ OmniRoute — Smart Router │
│ RTK + Caveman compression · 15 routing strategies │
│ Circuit breakers · TLS stealth · MCP · A2A · Guardrails │
└─────────────────────────┬──────────────────────────────────┘
┌─────────────┬────┴────────┬─────────────┐
▼ Tier 1 ▼ Tier 2 ▼ Tier 3 ▼ Tier 4
SUBSCRIPTION API KEY CHEAP FREE
Claude Code, DeepSeek, GLM $0.5, Kiro, Qoder,
Codex, Copilot Groq, xAI MiniMax $0.2 Pollinations
quota out? ───▶ budget hit? ─▶ budget hit? ─▶ always on
A combo is a chain of models OmniRoute routes across automatically. Quota runs out, a provider fails, or costs spike — the combo silently slides to the next model. This is what makes OmniRoute unbreakable. 🛡️
No combo to create. Set your model to auto (or a variant) and OmniRoute builds a virtual combo from your connected providers, scored live:
| Model ID | What it optimizes for |
|---|---|
auto |
🎯 Balanced default (LKGP — sticks to your last good provider) |
auto/coding |
🧑💻 Quality-first weights for code generation |
auto/fast |
⚡ Lowest latency first |
auto/cheap |
💰 Cheapest per token first |
auto/offline |
🔋 Most quota / rate-limit headroom first |
auto/smart |
🔭 Quality-first + 10% exploration to discover better models |
| Goal | Strategy / combo |
|---|---|
| 🥇 Drain my subscription before paying | priority / fill-first |
| ⚖️ Spread load across accounts | round-robin · weighted · p2c · least-used |
| 💸 Always cheapest viable model | cost-optimized · auto/cheap |
| 🧠 Hand off long context between models | context-relay · context-optimized |
| 🎲 Randomized / privacy routing | random · strict-random |
| 🤖 Just make it smart | auto (9-factor scoring) · lkgp · reset-aware |
The Auto-Combo engine scores every candidate on 9 factors (health, quota, cost, latency, success rate, freshness…) — see docs/routing/AUTO-COMBO.md.
| Layer | Scope | What it does |
|---|---|---|
| 🔌 Circuit breaker | whole provider | Stops hammering a provider that's failing upstream; auto-probes to recover |
| 💤 Connection cooldown | one account / key | Skips a rate-limited key while other keys keep serving |
| 🎯 Model lockout | provider + model | Quarantines just one quota-limited model, not the whole connection |
Combo: "always-on" Strategy: priority
1. cc/claude-opus-4-7 ← subscription (use it fully)
2. cx/gpt-5.5 ← second subscription
3. glm/glm-5.1 ← cheap backup ($0.5/1M)
4. kr/claude-sonnet-4.5 ← FREE, unlimited (never fails)
Result: 4 layers of fallback = zero downtime
📖 Auto-Combo Engine · Resilience Guide
| Feature | OmniRoute | Other routers |
|---|---|---|
| 🌐 Providers | 231 | 20–100 |
| 🆓 Free providers | 50+ (11 free forever) | 1–5 |
| 🔀 Routing strategies | 15 (priority, weighted, cost-optimized, context-relay…) | 1–3 |
| 🗜️ Token compression | RTK + Caveman stacked (15–95%) | None / 20–40% |
| 🧰 Built-in MCP server | 87 tools, 3 transports, 30 scopes | Rare |
| 🤝 A2A agent protocol | 6 skills, JSON-RPC 2.0 | None |
| 🧠 Memory (FTS5 + vector) | Yes | Rare |
| 🛡️ Guardrails (PII, injection, vision) | Yes | Rare |
| ☁️ Cloud agents | Codex, Devin, Jules | None |
| 🥷 TLS fingerprint stealth | JA3/JA4 via wreq-js | None |
| 🖥️ Multi-platform | Web · Desktop · Termux · PWA | Web only |
| 🌍 i18n | 42 locales | 0–4 |
📊 Detailed comparison vs LiteLLM, OpenRouter & Portkey → docs/comparison/OMNIROUTE_VS_ALTERNATIVES.md
Recent highlights from v3.8.20 → v3.8.33. Full history in
CHANGELOG.md.
- 🤖 One-command CLI/agent setup — a dedicated
setup-*command configures each coding tool to route through OmniRoute (Claude Code, Codex, Cline, Continue, Cursor, Roo Code, Kilo Code, Crush, Goose, Qwen Code, Aider, OpenCode, Gemini CLI);omniroute launch/omniroute launch-codexare zero-config launchers. → CLI Integrations - 🛰️ Remote mode — drive a remote OmniRoute from any machine with scoped access tokens (
omniroute connect/omniroute contexts/omniroute tokens). → Remote Mode - 🧭 Smarter auto-routing — OpenRouter-style
auto/<category>:<tier>combos (e.g.auto/coding:fast,auto/reasoning:pro), live Arena-ELO + models.dev model intelligence, per-step account allowlists, provider-wildcard combo steps, nested combo-ref execution, sticky weighted selection, andweb_search-aware routing to a configured model. → Auto-Combo - 🗜️ Pluggable compression — an async compression pipeline with Compression Studios, a stable LLMLingua-2 ONNX engine, RTK, delegated Anthropic Context Editing, and a unified settings panel with named profiles + an active-profile selector as the single source for per-engine on/off + level. → Compression
- 🕵️ Transparent MITM decrypt (TPROXY) — capture & translate traffic from CLIs that ignore proxy env vars, with a per-SNI certificate authority and a trust-store installer. → MITM/TPROXY
- 💸 Cost telemetry everywhere —
X-OmniRoute-*cost/usage headers on every endpoint (including media), a non-token cost engine, a cache-HITX-OmniRoute-Cost-Savedheader, and per-key USD spend quotas. → API Reference - 🧠 Memory you control — opt-in int8 vector quantization (Qdrant + sqlite-vec), memory off by default, and a per-request
x-omniroute-no-memoryheader. → Memory - 🛡️ Security — a prompt-injection guard across every LLM route (backed by a red-team suite), plus a free DuckDuckGo last-resort web search. → Guardrails
- 🤝 More providers & agents — Cursor Cloud Agent (a 4th cloud agent), a refreshed 231-provider catalog (OrcaRouter, Wafer AI, OpenAdapter, dit.ai, TokenRouter, …), Vertex AI media generation (speech / transcription / music / video), and one-click account import from CLIProxyAPI (
~/.cli-proxy-api/). → Providers - ⚡ Local performance & infra — a one-click local Redis launcher (
omniroute redis up, plus a dashboard Redis panel) and an optional Bifrost Go sidecar that offloads the hottest relay path (BIFROST_BASE_URL, with automatic fallback to the TypeScript path on timeout). → Environment
One config —
http://localhost:20128/v1— and every AI IDE or CLI runs on free & low-cost models.
Claude Code |
Codex CLI |
Gemini CLI |
![]() Cursor |
![]() Copilot |
![]() Continue |
OpenCode |
Kilo Code |
Droid |
![]() OpenClaw |
Kiro |
Command |
📖 Per-tool setup for all 16+ tools → docs/reference/CLI-TOOLS.md · 🧩 OpenCode plugin → @omniroute/opencode-provider
The most complete catalog of any open-source router: 231 providers, 50+ with a free tier, 11 free forever.
📖 Full machine-readable catalog → docs/reference/PROVIDER_REFERENCE.md
Same app, your machine, your rules. From a global npm install to your phone via Termux.
| Platform | Install | Highlights |
|---|---|---|
| 📦 npm (global) | npm install -g omniroute |
One command, any OS |
| 🐳 Docker | docker run … diegosouzapw/omniroute |
Multi-arch AMD64 + ARM64 |
| 🖥️ Desktop (Electron) | npm run electron:build |
Native window + system tray — Windows / macOS / Linux |
| 💪 ARM | native arm64 |
Raspberry Pi, ARM servers, Apple Silicon |
| 📱 Android (Termux) | pkg install nodejs-lts && npx -y omniroute |
Runs on your phone, 24/7, no root |
| 📲 PWA | "Add to Home Screen" | Fullscreen, offline, installable from browser |
| 🧩 OpenCode plugin | @omniroute/opencode-provider |
Native OpenCode integration |
| 🛠️ From source | npm install && npm run dev |
Hack on it, contribute |
📖 Docker Guide · Desktop · Termux · PWA · OpenCode
Your keys, your machine, your data. OmniRoute is a local proxy — it never phones home.
- 🏠 Runs 100% on your hardware — npm, Docker, desktop, or your phone. No OmniRoute cloud sits in the request path.
- 🔐 Credentials encrypted at rest — API keys & OAuth tokens sealed with AES-256-GCM.
- 🚫 Zero telemetry by default — your prompts go only to the providers you choose, nowhere else.
- 🛡️ Hardened gateway — API-key scoping, IP filtering, rate limits, prompt-injection guard, loopback-only process routes.
- 📜 MIT licensed & fully open-source — audit every line, self-host forever.
📖 Authorization · Guardrails · Compliance
OmniRoute isn't just a server — it's a full command-line cockpit with 60+ commands, plus open agent protocols so an AI agent can drive OmniRoute by itself.
omniroute # serve gateway + dashboard (port 20128)
omniroute chat # interactive TUI chat client (slash: /model /combo /skill /memory)
omniroute setup # guided first-run wizard
omniroute doctor # diagnose providers, ports, native depsOmniRoute on a server? Drive it from your laptop with the same CLI. Log in once with a scoped access token; every command then targets the remote.
omniroute connect 192.168.0.15 # password → scoped token, saved as a context
omniroute models list # ← runs against the REMOTE server
omniroute configure codex # ← picks a remote model, writes a local Codex profile
omniroute tokens create --name ci --scope read # mint narrower tokens for other machines
omniroute contexts use default # ← switch back to the local serverTokens are scoped read / write / admin; process-spawning routes stay loopback-only.
📖 Remote Mode
providers · oauth · keys · combo · nodes · models · cache · compression · cost · usage · quota · health · resilience · telemetry · logs · audit · mcp · a2a · cloud · memory · skills · eval · tunnel · backup · sync · webhooks · policy · pricing · translator · simulate …
Expose OmniRoute over MCP or A2A and any capable agent gets the keys to the whole gateway — routing, providers, combos, cache, compression, memory — autonomously.
| Protocol | Endpoint | Use it for |
|---|---|---|
| 🧰 MCP (stdio) | omniroute --mcp |
Plug into Claude Desktop, Cursor, any MCP client |
| 🌊 MCP (HTTP) | http://localhost:20128/api/mcp/stream |
Remote MCP — 87 tools, 30 scopes, full audit trail |
| 📡 MCP (SSE) | http://localhost:20128/api/mcp/sse |
Streaming MCP transport |
| 🤝 A2A | http://localhost:20128/.well-known/agent.json |
Agent-to-agent, JSON-RPC 2.0 + SSE, 6 skills |
# Give Claude Code the full OmniRoute toolset over MCP:
claude mcp add-server omniroute --type http --url http://localhost:20128/api/mcp/stream📖 MCP Server · A2A Server · Agent Protocols
Why use many token when few token do trick? Every request passes through OmniRoute's compression pipeline transparently — no client changes. It's now a stack of 9 composable engines that run in order and mix & match per routing combo — building on ideas from RTK, Caveman (⭐ 51K+), LLMLingua-2, and Troglodita (PT-BR).
Engines run in pipeline order; each is independently toggleable and configurable per combo:
| # | Engine | What it does |
|---|---|---|
| 1 | Session-Dedup | Drops content repeated across turns (content-addressed, cross-turn) |
| 2 | CCR | Archives large blocks behind retrieve markers, fetched on demand |
| 3 | RTK | Smart tool-result filtering, dedup & truncation (command-aware) |
| 4 | Headroom | Lossless tabular compaction of homogeneous JSON arrays (~30%+) |
| 5 | Caveman | Rule-based prose compression (~65–75% on output) |
| 6 | LLMLingua-2 | ML semantic pruning via MobileBERT ONNX — code-safe, async |
| 7 | Lite | Whitespace + image-URL trimming (latency-light baseline) |
| 8 | Aggressive | Summarization + progressive aging of old turns |
| 9 | Ultra | Heuristic token pruning with an optional small-model (SLM) tier |
Code blocks, URLs and structured data are always preserved byte-perfect. One-click presets combine the engines:
| Mode | Savings | Best for |
|---|---|---|
| 🪶 Lite | ~15% | Always-on safe default |
| 🪨 Standard (Caveman) | ~30% | Daily coding |
| ⚡ Aggressive | ~50% | Long tool-heavy sessions |
| 🔥 Ultra | ~75% | Maximum savings |
| 🧰 RTK | 60–90% | Shell/test/build/git output |
| 🔗 Stacked (RTK → Caveman) | 78–95% | Mixed prompts + tool logs |
Real example — Standard mode:
Before (69 tokens): "The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."
After (19 tokens): "New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."
Same answer. 72% fewer tokens. Zero accuracy loss. ✅
PT-BR example — Troglodita mode:
Antes (42 tokens): "O problema é que o componente está re-renderizando porque uma nova referência de objeto está sendo criada em cada ciclo de renderização. Eu recomendaria usar useMemo."
Depois (12 tokens): "Re-render: ref nova cada ciclo (objeto inline recriado). Usar
useMemo."Mesma resposta. ~70% menos tokens. Precisão técnica intacta. ✅
Client (10,000 tok) ──▶ OmniRoute Compression (9 engines) ──▶ Provider (~1,080 tok, up to 95% saved)
Default stacked combo runs RTK → Caveman. When both act on the same tool/context payload, savings compound:
combined = 1 − (1 − RTK) × (1 − Caveman_input)
average = 1 − (1 − 0.80) × (1 − 0.46) = 89.2%
range = 78.4 – 94.6%Code blocks, URLs, JSON and structured data are always protected by the preservation engine. Auto-trigger compression by token threshold, or assign a compression pipeline per routing combo.
📖 COMPRESSION_GUIDE.md · RTK_COMPRESSION.md · COMPRESSION_ENGINES.md
1) Install & run
npm install -g omniroute
omnirouteDashboard at http://localhost:20128 · API at http://localhost:20128/v1.
2) Connect a FREE provider (no signup)
Dashboard → Providers → connect Kiro AI (free Claude unlimited) or OpenCode Free (no auth) → done.
3) Point your coding tool
Base URL: http://localhost:20128/v1
API Key: [copy from Dashboard → Endpoints]
Model: auto (zero-config smart routing — or any provider/model)4) Verify it's working
curl http://localhost:20128/v1/models -H "Authorization: Bearer YOUR_KEY"You should see your connected models listed. 🎉 That's it — start coding, and OmniRoute auto-routes & falls back for you.
If your client cannot send custom headers, OmniRoute also exposes tokenized compatibility aliases:
OpenAI catalog: http://localhost:20128/vscode/YOUR_KEY/
OpenAI models: http://localhost:20128/vscode/YOUR_KEY/models
OpenAI chat: http://localhost:20128/vscode/YOUR_KEY/chat/completions
OpenAI responses: http://localhost:20128/vscode/YOUR_KEY/responses
Ollama chat: http://localhost:20128/vscode/YOUR_KEY/api/chat
Ollama tags: http://localhost:20128/vscode/YOUR_KEY/api/tagsUse these only for clients that cannot attach Authorization: Bearer .... Header auth remains the preferred mode.
🐳 Docker
docker run -d --name omniroute --restart unless-stopped --stop-timeout 40 \
-p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest🛠️ From source
cp .env.example .env && npm install
PORT=20128 npm run dev📦 pnpm
pnpm install -g omniroute && pnpm approve-builds -g && omniroute🐧 Arch Linux (AUR)
yay -S omniroute-bin && systemctl --user enable --now omniroute.service🔧 Nix (Flake)
# Using Nix flakes
nix develop
npm run dev
# Or using devbox
devbox run npm run dev📖 Docker Guide — Compose profiles, Caddy HTTPS, Cloudflare tunnels.
🦭 Podman
# 1. Build the image
podman build --target runner-base -t omniroute:base .
# 2. Fix data directory permissions for rootless Podman
mkdir -p data && podman unshare chown 1000:1000 ./data
# 3. Set runtime in .env, then run (see contrib/podman/ for Quadlet)
echo "CONTAINER_HOST=podman" >> .env
podman compose --profile base up -d📖 Podman Guide — Quadlet setup, podman-compose, Quadlet.
🎬 Made a video about OmniRoute? Open an issue or discussion with the link — we'll feature it here.
💰 Pricing at a glance & the $0 Free Stack (11 providers)
| Tier | Example | Cost |
|---|---|---|
| 💳 Subscription | Claude Code Pro / Codex / Copilot | $10–200/mo |
| 🔑 API Key (free tiers) | NVIDIA NIM, Cerebras, Groq | FREE |
| 💰 Cheap | GLM-5 $0.5/1M · MiniMax M2.5 $0.3/1M | pennies |
| 🆓 Free Forever | Kiro, Qoder, Qwen, Pollinations, LongCat | $0 |
The $0 Free Stack — combine into one unbreakable combo:
| Provider | Prefix | Free models | Quota |
|---|---|---|---|
| Kiro | kr/ |
Claude Sonnet 4.5, Haiku 4.5, Opus 4.6 | 50 credits/mo |
| Qoder | if/ |
kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 | ♾️ Unlimited |
| Qwen | qw/ |
qwen3-coder-plus/flash/next | ♾️ Unlimited |
| Pollinations | pol/ |
GPT-5, Claude, Gemini, DeepSeek, Llama 4 | No key needed |
| LongCat | lc/ |
LongCat-Flash-Lite | 50M tokens/day 🔥 |
| Cloudflare AI | cf/ |
50+ models | 10K neurons/day |
| NVIDIA NIM | nvidia/ |
129 models | ~40 RPM |
| Cerebras | cerebras/ |
Qwen3 235B, GPT-OSS 120B | 1M tok/day |
💡 The dashboard "cost" is a savings tracker, not a bill — OmniRoute never charges you. A "$290 total cost" using free models means $290 saved.
📖 Complete free directory → docs/reference/FREE_TIERS.md — 25+ providers, quotas, base URLs.
🎯 Use Cases — ready-made combo playbooks
$0 forever:
1. kr/claude-sonnet-4.5 (Kiro — unlimited)
2. if/kimi-k2-thinking (Qoder — unlimited)
3. pol/gpt-5 (Pollinations — no key)
4. lc/longcat-flash-lite (50M tok/day backup)
Compression: aggressive (~50%) → double your free quota · Cost: $0/mo
24/7 no interruptions: chain 2 subscriptions → cheap → free for 5 layers of fallback.
Blocked region: free providers + global/per-provider proxy → access AI from any country.
Max savings: subscription + cheap backup + ultra compression (~75%) → ~$150–300/mo saved for heavy users.
🌍 Bypass geo-blocks — 3-level proxy + stealth
🇷🇺 🇨🇳 🇮🇷 🇨🇺 🇹🇷 In a blocked region? OmniRoute's 3-level proxy (Global / Per-Provider / Per-Connection) proxies API requests, OAuth flows, connection tests, token refresh & model sync.
- Protocols: HTTP/HTTPS, SOCKS5, authenticated proxies
- 🆓 1proxy marketplace — hundreds of free validated proxies, quality scores, auto-rotation
- Anti-detection — TLS fingerprint spoofing (
wreq-js), CLI fingerprint matching, proxy IP preservation
✨ Full feature list — 30+ capabilities (memory, evals, observability)
Routing: 15 strategies · task-aware smart routing · thinking budget controls · wildcard routing · system prompt injection.
Compatibility: OpenAI ↔ Claude ↔ Gemini ↔ Responses API · auto OAuth refresh (PKCE, 8 providers) · multi-account round-robin · Batch + Files API · live OpenAPI 3.0.
Protocols: MCP (87 tools, 3 transports, 30 scopes) · A2A (JSON-RPC 2.0, SSE, 6 skills) · ACP · cloud agents (Codex, Devin, Jules).
Plugins: custom plugin marketplace (system-configured registry URL with SSRF-guarded fetch) · install / enable / disable · Notion + Obsidian knowledge-base integrations (WebDAV file server, vault search, note CRUD).
Embedded services: one-click install & lifecycle management of local sidecar services (CLIProxy, NineRouter).
Quality & Ops: built-in Evals (golden-set: exact/contains/regex/custom) · guardrails (PII, injection, vision) · health dashboard · p50/p95/p99 telemetry · webhooks · compliance audit.
AI Agent Skills: drop-in markdown manifests — point any agent at a skills/*/SKILL.md manifest. 43 skills available.
📖 MCP Server · A2A Server · Resilience Guide · Features Gallery
📖 Setup, env vars & FAQ
| Env var | Default | Purpose |
|---|---|---|
PORT |
20128 |
API + dashboard port |
REQUIRE_API_KEY |
false |
Require API key for all requests |
DATA_DIR |
~/.omniroute |
Database & config storage |
Will I be charged by OmniRoute? No — it's free, open-source software on your machine. You only pay paid providers directly. OmniRoute has no billing system. Are FREE providers really unlimited? Yes — Kiro, Qoder, Pollinations, LongCat, Cloudflare. No catch. Will compression hurt quality? No — it only compresses the input; code, URLs, JSON are always protected. Does it work where AI is blocked? Yes — 3-level proxy + 1proxy marketplace reach all 231 providers.
🐛 Troubleshooting
| Problem | Quick fix |
|---|---|
| "Language model did not provide messages" | Provider quota exhausted → use a combo fallback |
| Rate limiting (429) | Add fallback: cc/claude → glm/glm-4.7 → if/kimi-k2-thinking |
| OAuth token expired | Auto-refreshed; if stuck, delete + re-auth in Providers |
unsupported_country_region_territory |
Configure proxy in Settings → Proxy |
| Docker SQLite locks | Use --stop-timeout 40 for clean WAL checkpoint |
| Node runtime errors | Use Node >=22.0.0 <23 or >=24.0.0 <27 |
🐛 Reporting a bug? Run npm run system-info and attach system-info.txt. 📖 docs/guides/TROUBLESHOOTING.md
📸 Dashboard screenshots
| Page | Screenshot | Page | Screenshot |
|---|---|---|---|
| Providers | ![]() |
Combos | ![]() |
| Analytics | ![]() |
Health | ![]() |
| Translator | ![]() |
Settings | ![]() |
| CLI Tools | ![]() |
Usage Logs | ![]() |
💬 Chat with the community — Discord, Telegram & WhatsApp (🌍 / 🇧🇷) links are at the top of this README.
- 🌍 Website: omniroute.online
- 🐙 GitHub: github.com/diegosouzapw/OmniRoute
- 🐛 Issues: report a bug (attach
npm run system-infooutput) - 🤝 Contributing: see CONTRIBUTING.md or pick a
good first issue
- Runtime: Node.js 22.x or 24.x LTS (24 LTS recommended) —
>=22.0.0 <23 || >=24.0.0 <27 - Language: TypeScript 6.0 — 100% TypeScript across
src/andopen-sse/(zeroanyin core modules since v2.0) - Framework: Next.js 16 + React 19 + Tailwind CSS 4
- Database: better-sqlite3 (SQLite) + LowDB (JSON legacy) — domain state, proxy logs, MCP audit, routing decisions, memory, skills
- Schemas: Zod (MCP tool I/O validation, API contracts)
- Protocols: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
- Streaming: Server-Sent Events (SSE) + WebSocket bridge (
/v1/ws) - Auth: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
- Testing: Node.js test runner + Vitest (14,965 test cases across 517 files — unit, integration, E2E, security, ecosystem)
- Platforms: Desktop (Electron), Android (Termux), PWA (any browser)
- CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
- Website: omniroute.online
- Package: npmjs.com/package/omniroute
- Docker: hub.docker.com/r/diegosouzapw/omniroute
- Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing
| Document | Description |
|---|---|
| User Guide | Providers, combos, CLI integration, deployment |
| Setup Guide | Full install methods, CLI tool configs, protocol setup, timeout tuning |
| CLI Tools Guide | Per-tool setup for Claude Code, Codex, Cursor, Cline, OpenClaw, Kilo, Copilot |
| Remote Mode | Drive a remote OmniRoute (VPS) from your laptop CLI via scoped access tokens |
| Claude Code Config | Point Claude Code at OmniRoute (local/remote) with launch + per-model profiles |
| Quick Start | 3-step install → connect → configure |
| Document | Description |
|---|---|
| Docker Guide | Docker run, Compose profiles, Caddy HTTPS, tunnels, image tags |
| Podman Guide | Quadlet systemd integration, podman-compose, SELinux |
| VM Deployment | Complete guide: VM + nginx + Cloudflare setup |
| Fly.io Deployment | Deploy to Fly.io with persistent storage |
| Termux Guide | Run OmniRoute on Android via Termux |
| PWA Guide | Progressive Web App install, caching, architecture |
| Uninstall Guide | Clean removal for all install methods |
| Environment Config | Complete .env variables and references |
| Document | Description |
|---|---|
| Architecture | System architecture, data flow, and internals |
| Compression Guide | 7-option pipeline: off / lite / standard / aggressive / ultra / RTK / stacked |
| RTK Compression | Command-output compression, filters, trust, verify, raw-output recovery |
| Compression Engines | Caveman, RTK, stacked pipelines, dashboard/API/MCP surfaces |
| Compression Rules Format | JSON rule-pack schemas for Caveman and RTK filters |
| Compression Language Packs | Language detection and Caveman rule-pack authoring |
| Resilience Guide | Circuit breakers, cooldowns, queue, anti-thundering herd, TLS spoofing |
| Auto-Combo Engine | 9-factor scoring, mode packs, self-healing |
| Proxy Guide | 3-level proxy system, 1proxy marketplace, registry CRUD |
| Free Tiers | 25+ free API providers consolidated directory |
| Features Gallery | Visual dashboard tour with screenshots |
| Codebase Documentation | Beginner-friendly codebase walkthrough |
| Document | Description |
|---|---|
| API Reference | All endpoints with examples |
| OpenAPI Spec | OpenAPI 3.0 specification |
| MCP Server | 87 MCP tools, IDE configs, Python/TS/Go clients |
| MCP Server Guide | MCP installation, transports, and tool reference |
| A2A Server | JSON-RPC 2.0 protocol, skills, streaming, task mgmt |
| A2A Server Guide | A2A agent card, tasks, skills, and streaming |
| Document | Description |
|---|---|
| Contributing | Development setup and guidelines |
| Changelog | Full per-version release history |
| Security Policy | Vulnerability reporting and security practices |
| i18n Guide | 40+ language support, translation workflow, RTL |
| Release Checklist | Pre-release validation steps |
| Coverage Plan | Test coverage strategy and 14,965 test suite |
OmniRoute is shaped by a passionate open-source community. These individuals have made exceptional contributions that directly impact the quality, stability, and reach of the project. Thank you.
![]() oyi77 🥇 190 commits • +72K lines Analytics engine, SQL aggregations, proxy marketplace, test coverage |
![]() Chris Staley 🥈 72 commits • +5.7K lines SSE stream hardening, Responses API, Gemini pagination, test regression fixes |
![]() zenobit 🥉 62 commits • +24K lines CI/CD pipeline, i18n for 33 languages, Void Linux package, platform fixes |
![]() R.D. & Randi 🏅 107 commits • +28K lines Endpoints page, tunnel integrations, Docker workflows, A2A status, compression UI |
![]() benzntech 🏅 20 commits • +7.5K lines Electron desktop app, auto-updater, release build workflows, cross-platform CI |
🙏 These contributors' features, bug fixes, and infrastructure improvements are a core part of what makes OmniRoute reliable and feature-rich. Every pull request, every test case, and every i18n translation file matters. Open source is built by people like them.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
# Create a release — npm publish happens automatically
gh release create v3.8.2 --title "v3.8.2" --generate-notesOmniRoute stands on the shoulders of giants. It started as a fork of 9router and a TypeScript port of the Go project CLIProxyAPI — and from there, every subsystem below was inspired by an open-source project that got there first. Each one shaped a concrete piece of OmniRoute. This is our thank-you to all of them. 🙏
⭐ star counts as of June 2026 — go give these projects a star.
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| 9router · decolua | 17.9k | The original project this fork is built on — extended here with multi-modal APIs and a full TypeScript rewrite. |
| CLIProxyAPI · router-for-me | 37.8k | The Go implementation that inspired this JavaScript / TypeScript port. |
| LiteLLM · BerriAI | 50.8k | The AI gateway whose public pricing dataset feeds our cost-tracking sync and whose provider-normalization model informed our routing. |
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| Caveman · JuliusBrussee | 74.5k | The viral "why use many token when few token do trick" project — its caveman-speak philosophy powers our standard compression mode and 30+ filler/condensation rules. |
| RTK – Rust Token Killer · rtk-ai | 63.6k | High-performance command-output compression — inspired our RTK engine, JSON filter DSL, raw-output recovery and the stacked RTK → Caveman pipeline. |
| headroom · chopratejas | 33.6k | Reversible context-compression (SmartCrusher) — inspired our headroom engine and the ccr retrieve-marker pattern. |
| LLMLingua · Microsoft | 6.3k | Prompt-compression research (LLMLingua / LLMLingua-2) — inspired our async, code-safe, fail-open llmlingua engine. |
| llmlingua-2-js · atjsh | 27 | The JS/ONNX port (MobileBERT / XLM-RoBERTa) used as the worker-thread backend for our LLMLingua engine. |
| Troglodita · Lenine Júnior | 15 | PT-BR token compression — powers our pt-BR language pack: pleonasm reduction and filler removal tuned for Brazilian-Portuguese grammar. |
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| TOON · toon-format | 24.6k | Token-Oriented Object Notation — its columnar, header-plus-rows model shaped our tabular compaction stage. |
| GCF – Graph Compact Format · Blackwell Systems | 11 | Schema-aware "JSON for LLMs" notation — co-inspired our lossless homogeneous-array compaction with [N rows] markers. |
| token-optimizer-mcp · ooples | 409 | Brotli/SQLite cache + per-session context-delta — inspired our session-dedup engine. |
| token-savior · Mibayy | 993 | Bash-output compaction + MCP profiles — inspired our compression bail-out discipline and MCP tool-manifest reduction. |
| token-saver · ppgranger | 103 | Content-aware, per-file-type output compression with failure-aware bail-out — validated our per-type dispatch and minimum-gain skip. |
| token-optimizer · alexgreensh | 1.4k | "Find the ghost tokens" — its offload + recoverable-handle pattern informed our CCR offload thinking. |
| TokenMizer · Shweta-Mishra-ai | 1 | A session-graph + cross-turn line-dedup blueprint that informed our session-dedup design. |
| mcp-compressor · Atlassian Labs | 80 | MCP tool-schema/description compression — informed our MCP tool-manifest cardinality reduction. |
| RepoMapper · pdavis68 | 182 | Aider-style repo-map ranking — informed our repo-map / retrieval-ranking exploration. |
| quiet-shell-mcp · mrsimpson | 4 | Declarative shell-output reduction over MCP — validated our declarative bash-output compaction. |
| ts-morph · David Sherret | 6.1k | TypeScript Compiler API toolkit — inspired our parser-based comment removal that preserves string, template and regex literals. |
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| Mem0 · mem0ai | 58.9k | Universal memory layer — its proxy-as-write/read-boundary model shaped our memory architecture. |
| Letta (MemGPT) · letta-ai | 23.4k | Stateful agents with tiered memory — inspired our Context Control & Recovery (CCR) tiered model. |
| WFGY · onestardao | 1.8k | The ProblemMap taxonomy of 16 recurring RAG/LLM failure modes — the shared vocabulary in our troubleshooting guide. |
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| llm-interceptor · chouzz | 46 | MITM interception/analysis of coding-assistant ↔ LLM traffic — our Traffic Inspector ports its SSE merge, conversation normalization, host passthrough and secret masking (MIT). |
| ProxyBridge · InterceptSuite | 5.1k | Transparent per-process proxy routing — inspired our crash-safe MITM teardown, socket idle-timeouts, /proc process attribution and TPROXY capture. |
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| models.dev · SST / OpenCode | 5.1k | Open database of AI model specs, pricing and capabilities — synced natively into our model catalog. |
| React Flow / xyflow · xyflow | 37.1k | The node-based graph library powering our real-time Compression Studio and Combo/Routing Studio. |
| LangGraph · LangChain | 35.1k | LangGraph Studio's live workflow-graph visualization inspired our Studios' real-time cascade view. |
| Langfuse · Langfuse | 29.3k | Its trace → span → generation observability model shaped our Compression Studio waterfall. |
| Kiali · Kiali | 3.6k | Istio service-mesh observability — inspired our circuit-breaker badges and error-edge visuals in the Routing/Combo Studio. |
| lobe-icons · LobeHub | 2.1k | AI/LLM brand logos that render the provider icons across our dashboard. |
| Project | ⭐ | How it inspired OmniRoute |
|---|---|---|
| awesome-secure-defaults · tldrsec | 708 | A curated list of secure-by-default libraries that guides our security choices (Helmet.js, DOMPurify, ssrf-req-filter, safe-regex, Google Tink). |
OmniRoute is free and open source, built and maintained in the open. If it saves you time or money, consider supporting development:
- ⭐ Star the repo — it genuinely helps visibility
- 💖 GitHub Sponsors — fund ongoing maintenance and new providers
- 🐛 Report bugs and share feedback in Discussions
MIT License - see LICENSE for details.
⬆ Back to top · Built with ❤️ for the open-source AI community.
OmniRoute v3.8.24 · Node ≥22.0.0 · MIT License · omniroute.online



















