Skip to content

Latest commit

 

History

History
240 lines (171 loc) · 6.44 KB

File metadata and controls

240 lines (171 loc) · 6.44 KB
title Getting Started
description Install and run FrugalRoute, send your first request, and connect any OpenAI-compatible client.

Getting Started

FrugalRoute is a local-first LLM routing layer that sits between your application and your models. Instead of hardcoding a specific model, you describe what you need -- and FrugalRoute picks the cheapest capable model, starting with local inference via Ollama and escalating to cloud providers only when necessary.

It exposes an OpenAI-compatible API on localhost:3100. Any client that speaks the OpenAI chat completions protocol works -- the OpenAI SDK, the Anthropic SDK (via its OpenAI-compat mode), fetch, curl, or any HTTP client. FrugalRoute is not tied to OpenAI; it uses the OpenAI wire format as a universal interface.

Prerequisites

  • Ollama (for local model inference)
  • Either Node.js (v16+) or Bun (v1.1+) to run FrugalRoute
  • Optionally, API keys for cloud providers: OpenAI, Anthropic, Google (Gemini), Groq, Mistral, Kimi (Moonshot), or DeepSeek

Installation

From npm (recommended)

npm install -g frugalroute

Or run without installing:

npx frugalroute
# or with bun
bunx frugalroute

That's it — frugalroute is now available globally and the server will start on http://localhost:3100.

From source

If you want to hack on FrugalRoute or use unreleased features:

git clone https://github.com/SimplyLiz/FrugalRoute && cd FrugalRoute
bun install
bun run dev

Configuration

Copy the example environment file and edit it:

cp .env.example .env

The defaults work for local-only usage. Add API keys for any cloud providers you want:

# .env
PORT=3100

OLLAMA_BASE_URL=http://localhost:11434

# Cloud providers — all optional, add the ones you have
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_API_KEY=your-key-here
GROQ_API_KEY=gsk_your-key-here
MISTRAL_API_KEY=your-key-here
KIMI_API_KEY=your-key-here
DEEPSEEK_API_KEY=your-key-here

EMBEDDING_MODEL=nomic-embed-text
DEFAULT_MAX_COST_PER_REQUEST=0.01

Each key you add registers that provider's models. No key = no registration, no errors. You can start with just Ollama and add cloud providers later.

Pull local models

FrugalRoute needs at least one local model and the embedding model for semantic routing:

ollama pull gemma3:4b
ollama pull nomic-embed-text

Start the server

If installed globally via npm:

frugalroute

If running from source:

bun run dev

You should see the server listening on http://localhost:3100.

First request

Send a chat completion request with curl:

curl http://localhost:3100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "Explain what a merkle tree is in two sentences." }
    ]
  }'

FrugalRoute will select the most cost-effective model that can handle the request, run inference, and return a standard OpenAI-shaped response.

Connecting your application

FrugalRoute speaks the OpenAI wire format, so any client that supports a custom base_url works. You do not need an OpenAI account or API key to use FrugalRoute -- Ollama runs locally by default with zero cost.

curl

curl http://localhost:3100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "What is a B-tree?" }
    ]
  }'

Omit the model field entirely to let the router decide. Or use a model alias:

curl http://localhost:3100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fast",
    "messages": [
      { "role": "user", "content": "Format this as JSON: name=Alice age=30" }
    ]
  }'

Default aliases: fast -> local model, smart -> GPT-4o, best -> Claude Sonnet.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3100/v1",
    api_key="unused",  # FrugalRoute does not require an API key by default
)

response = client.chat.completions.create(
    model="auto",  # let the router decide
    messages=[
        {"role": "user", "content": "What is a B-tree?"}
    ],
)

print(response.choices[0].message.content)

TypeScript (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3100/v1",
  apiKey: "unused",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [
    { role: "user", content: "What is a B-tree?" },
  ],
});

console.log(response.choices[0].message.content);

TypeScript (fetch)

No SDK needed -- FrugalRoute is just an HTTP endpoint:

const response = await fetch("http://localhost:3100/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    messages: [{ role: "user", content: "What is a B-tree?" }],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

Python (Anthropic SDK, OpenAI-compat mode)

The Anthropic SDK can also talk to OpenAI-compatible endpoints:

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:3100",
    api_key="unused",
)

# Use the messages API -- FrugalRoute normalizes the format internally

What model should I set?

Value Behaviour
(omitted or "auto") Router picks the cheapest capable model based on your prompt
"fast" / "smart" / "best" Model alias -- resolved to a real model ID before routing
"gemma3-4b" / "gpt-4o" / etc. Direct model -- bypasses the router entirely

Running without cloud keys

FrugalRoute works with Ollama only. If you don't set OPENAI_API_KEY or ANTHROPIC_API_KEY, the router uses local models exclusively. Cost: $0 for every request.

Cloud providers are optional escalation targets. Add keys only if you want the router to escalate complex tasks (reasoning, coding) to more capable models when local confidence is low.

Next steps

  • Configuration -- tune routing thresholds, cost limits, and provider priorities
  • Routing -- understand capability matching, aliases, sticky sessions, and the escalation cascade
  • Observability -- circuit breaker, latency tracking, and health probing
  • API Reference -- full endpoint documentation