Releases: Inebrio/Routerly
Routerly v0.1.5
Routerly is a self-hosted LLM gateway that sits between your application and any AI provider. Swap one URL, keep every existing SDK, and get full control over routing, cost, and access — without changing a line of application code.
Core Features
OpenAI & Anthropic Wire-Compatible Proxy
Drop-in replacement for the OpenAI and Anthropic APIs. Any SDK, tool, or framework that speaks either protocol works without modification — Python, Node.js, .NET, Go, Cursor, Continue.dev, LibreChat, LangChain, LlamaIndex, Open WebUI, and more.
Intelligent Multi-Policy Routing
Each request is scored against up to 9 pluggable routing policies applied simultaneously. Routerly picks the best candidate and falls back automatically if a provider fails.
| Policy | What it does |
|---|---|
llm |
Asks a language model to pick the best candidate given request context |
cheapest |
Minimises cost per token |
health |
Deprioritises models with recent errors |
performance |
Favours models with lower average latency |
capability |
Matches models to task requirements (vision, tools, JSON mode…) |
context |
Filters models by context window size relative to the prompt |
budget-remaining |
Excludes models that would push a project over its budget |
rate-limit |
Steers traffic away from rate-limited providers |
fairness |
Balances load across candidates |
Real-Time Cost Tracking & Budgets
Every request is priced at the token level. Costs accumulate per project, and hard limits (hourly / daily / weekly / monthly / per-request) block overspending before it happens.
Multi-Tenant Project Isolation
Separate Bearer tokens per project. Each project has its own model pool, routing policies, and budget envelope — ideal for multi-tenant SaaS, team isolation, or dev/staging/prod separation.
Streaming & Tool Calling
Full streaming SSE support and OpenAI-compatible tool/function calling pass-through, including streaming tool call deltas.
Supported Providers
OpenAI · Anthropic · Google Gemini · Ollama (local) · Mistral · Cohere · xAI (Grok) · Any custom HTTP endpoint. Mix cloud and local models in the same project.
Dashboard
Built-in React web dashboard with:
- Overview — live spend, call volume, success rate, and daily cost trend
- Models — manage registered models with provider badges and pricing info
- Projects — create and configure projects with routing policies and model pools
- Usage analytics — per-model breakdown of calls, tokens in/out, errors, and cost; filterable by period, project, and status
- Users & Roles — RBAC with configurable roles and permissions
- Light / dark theme
CLI (routerly)
Full-featured admin CLI:
routerly status— server health, reachability, uptime, model/project counts;--jsonfor scriptingrouterly model— add, list, remove modelsrouterly project— add, list, remove projects; manage model assignmentsrouterly auth— manage multiple server accounts (login, logout, switch, rename, whoami)routerly user— add, list, remove usersrouterly role— add, list, edit, remove rolesrouterly report— pull usage and call reportsrouterly service— start/stop/configure the local service
Installer
One-line install for macOS, Linux, and Windows:
curl -fsSL https://github.com/Inebrio/Routerly/releases/latest/download/install.sh | bashThe installer detects the platform, optionally installs Node.js 20+, builds packages, generates an encryption key, sets up an auto-start daemon, and walks through first-run setup (model, project, admin user). On existing installs it shows an Update / Reinstall / Uninstall menu.
Docker Compose image also available for zero-dependency deployment.
Design Philosophy
- No external database — config and usage data live in plain JSON files
- Fully decoupled — service, dashboard, and CLI can run on different machines
- No telemetry — your prompts and keys never leave your infrastructure
- Zero migration cost — one env-var change, nothing else