Intelligent LLM routing. One API, 345+ models, always the best one.
Route any prompt to the optimal AI model based on topic, complexity, and budget — with benchmark-verified selection across OpenAI, Anthropic, Google, DeepSeek, and xAI.
LLMs vary wildly in quality by task. GPT-4o dominates coding, Gemini excels at long context, DeepSeek leads math reasoning. ArcRouter picks the right model for every prompt — automatically.
- Hybrid semantic routing — lexical prefilter + benchmark shortlist + embedding reranking
- 24 topic categories with subcategory detection (code/frontend, math/calculus, science/physics, ...)
- 4 complexity tiers — SIMPLE, MEDIUM, COMPLEX, REASONING
- Agentic detection — auto-detects tool-use prompts, routes to tool-capable models
- 345+ models scored with real benchmarks (LiveBench, LiveCodeBench, HumanEval, GPQA)
- Council mode — query 3-7 models in parallel, return the consensus answer
- Semantic grouping — embedding similarity (paid) or Jaccard overlap (free) to find agreement
- Chairman escalation — when models disagree, a synthesis model resolves conflicts
- X-Agent-Step header — hint complexity per step (
simple-action,code-generation,reasoning,verification) - Workflow budgets — set a total USD budget per workflow, auto-downgrade at 60/80/95% spend
- Session model pinning — lock a model for a conversation via
session_id - Workflow telemetry —
GET /v1/workflow/{session_id}/usagefor spend, latency, tier distribution
- Prompt compression — lossless L1 (dedup) + L2 (whitespace) + L5 (JSON compaction), 10-15% token savings
- 5 direct providers — OpenAI, Anthropic, Google, DeepSeek, xAI (no middleman markup)
- OpenRouter fallback — 300+ long-tail models when direct access isn't available
- Max cost filtering —
max_costper request, graceful downgrade to cheapest viable model - Routing metadata — every response includes
estimated_cost_usd,savings_vs_gpt4_pct,models_considered
- Circuit breaker + fallback chains — auto-retry with up to 3 fallback models on failure
- Context-length filtering — excludes models with insufficient context window
- Model exclusion —
exclude_modelsto block specific models per request
- x402 micropayments — pay per request with USDC on Base, no API key needed
- Stripe metered billing — traditional API key billing for teams
- Free tier — no auth required, free models, rate-limited
- Model aliases —
"claude","gpt","gemini","deepseek"instead of full model IDs - OpenAI-compatible API — drop-in replacement, same
/v1/chat/completionsformat - Streaming — SSE streaming with routing metadata in the first chunk
- Usage tracking —
GET /v1/usagefor per-key daily stats
# TypeScript SDK
pnpm add @arcrouter/sdk
# MCP server for Claude Code / Cursor / Cline
claude mcp add arcrouter --transport http https://api.arcrouter.com/mcp
# Direct API (no auth required for free tier)
curl https://api.arcrouter.com/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"budget":"auto"}'import { ArcRouter } from '@arcrouter/sdk';
const arc = new ArcRouter();
const res = await arc.chat('Explain quantum entanglement');
console.log(res.content); // The answer
console.log(res.routing.model); // e.g. "gemini-2.0-flash"
console.log(res.routing.topic); // "science/physics"
console.log(res.routing.complexity);// "MEDIUM"| Package | Description | Install |
|---|---|---|
@arcrouter/sdk |
TypeScript SDK — routing, council, streaming, workflows, x402 | pnpm add @arcrouter/sdk |
arcrouter-classifier |
Open-source prompt classifier — topic, complexity, compression | pnpm add arcrouter-classifier |
arcrouter-mcp |
MCP server for AI coding tools (3 tools: chat, models, health) | claude mcp add arcrouter ... |
| awesome-arcrouter | Community integrations and resources | — |
Prompt → Classify (topic + complexity + agentic) → Score 345+ models → Semantic rerank → Route → Fallback chain
- Classify — detect topic (24 categories), complexity (4 tiers), agentic intent, and confidence
- Score — rank models using real benchmarks, weighted by semantic relevance + value + reliability
- Route — hybrid semantic routing narrows to optimal model, filtered by budget + context length + cost
- Respond — direct provider call with circuit breaker, up to 3 fallback models on failure
- Compress — lossless prompt compression on long conversations (>5000 chars) to reduce token cost
- Website: arcrouter.com
- API: api.arcrouter.com
- Docs: arcrouter.com/docs
- npm: @arcrouter/sdk · arcrouter-classifier