| title | Getting Started |
|---|---|
| description | Install and run FrugalRoute, send your first request, and connect any OpenAI-compatible client. |
FrugalRoute is a local-first LLM routing layer that sits between your application and your models. Instead of hardcoding a specific model, you describe what you need -- and FrugalRoute picks the cheapest capable model, starting with local inference via Ollama and escalating to cloud providers only when necessary.
It exposes an OpenAI-compatible API on localhost:3100. Any client that speaks the OpenAI chat completions protocol works -- the OpenAI SDK, the Anthropic SDK (via its OpenAI-compat mode), fetch, curl, or any HTTP client. FrugalRoute is not tied to OpenAI; it uses the OpenAI wire format as a universal interface.
- Ollama (for local model inference)
- Either Node.js (v16+) or Bun (v1.1+) to run FrugalRoute
- Optionally, API keys for cloud providers: OpenAI, Anthropic, Google (Gemini), Groq, Mistral, Kimi (Moonshot), or DeepSeek
npm install -g frugalrouteOr run without installing:
npx frugalroute
# or with bun
bunx frugalrouteThat's it — frugalroute is now available globally and the server will start on http://localhost:3100.
If you want to hack on FrugalRoute or use unreleased features:
git clone https://github.com/SimplyLiz/FrugalRoute && cd FrugalRoute
bun install
bun run devCopy the example environment file and edit it:
cp .env.example .envThe defaults work for local-only usage. Add API keys for any cloud providers you want:
# .env
PORT=3100
OLLAMA_BASE_URL=http://localhost:11434
# Cloud providers — all optional, add the ones you have
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_API_KEY=your-key-here
GROQ_API_KEY=gsk_your-key-here
MISTRAL_API_KEY=your-key-here
KIMI_API_KEY=your-key-here
DEEPSEEK_API_KEY=your-key-here
EMBEDDING_MODEL=nomic-embed-text
DEFAULT_MAX_COST_PER_REQUEST=0.01Each key you add registers that provider's models. No key = no registration, no errors. You can start with just Ollama and add cloud providers later.
FrugalRoute needs at least one local model and the embedding model for semantic routing:
ollama pull gemma3:4b
ollama pull nomic-embed-textIf installed globally via npm:
frugalrouteIf running from source:
bun run devYou should see the server listening on http://localhost:3100.
Send a chat completion request with curl:
curl http://localhost:3100/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "Explain what a merkle tree is in two sentences." }
]
}'FrugalRoute will select the most cost-effective model that can handle the request, run inference, and return a standard OpenAI-shaped response.
FrugalRoute speaks the OpenAI wire format, so any client that supports a custom base_url works. You do not need an OpenAI account or API key to use FrugalRoute -- Ollama runs locally by default with zero cost.
curl http://localhost:3100/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "What is a B-tree?" }
]
}'Omit the model field entirely to let the router decide. Or use a model alias:
curl http://localhost:3100/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fast",
"messages": [
{ "role": "user", "content": "Format this as JSON: name=Alice age=30" }
]
}'Default aliases: fast -> local model, smart -> GPT-4o, best -> Claude Sonnet.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3100/v1",
api_key="unused", # FrugalRoute does not require an API key by default
)
response = client.chat.completions.create(
model="auto", # let the router decide
messages=[
{"role": "user", "content": "What is a B-tree?"}
],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3100/v1",
apiKey: "unused",
});
const response = await client.chat.completions.create({
model: "auto",
messages: [
{ role: "user", content: "What is a B-tree?" },
],
});
console.log(response.choices[0].message.content);No SDK needed -- FrugalRoute is just an HTTP endpoint:
const response = await fetch("http://localhost:3100/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
messages: [{ role: "user", content: "What is a B-tree?" }],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);The Anthropic SDK can also talk to OpenAI-compatible endpoints:
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:3100",
api_key="unused",
)
# Use the messages API -- FrugalRoute normalizes the format internally| Value | Behaviour |
|---|---|
(omitted or "auto") |
Router picks the cheapest capable model based on your prompt |
"fast" / "smart" / "best" |
Model alias -- resolved to a real model ID before routing |
"gemma3-4b" / "gpt-4o" / etc. |
Direct model -- bypasses the router entirely |
FrugalRoute works with Ollama only. If you don't set OPENAI_API_KEY or ANTHROPIC_API_KEY, the router uses local models exclusively. Cost: $0 for every request.
Cloud providers are optional escalation targets. Add keys only if you want the router to escalate complex tasks (reasoning, coding) to more capable models when local confidence is low.
- Configuration -- tune routing thresholds, cost limits, and provider priorities
- Routing -- understand capability matching, aliases, sticky sessions, and the escalation cascade
- Observability -- circuit breaker, latency tracking, and health probing
- API Reference -- full endpoint documentation