Multi-model LLM routing with Claude-first architecture and intelligent fallbacks.
This is a standalone extraction from my production portfolio site. See it in action at danmonteiro.com.
You're building with multiple LLMs but:
- Hardcoding provider logic — switching models means rewriting code
- No graceful degradation — when Claude is rate-limited, your app breaks
- Wasting money — paying Opus prices for "what's the weather?" queries
- Complexity sprawl — different APIs, response formats, error handling for each provider
AI Orchestrator provides:
- One interface, any provider — Claude, GPT, and Gemini behind a unified API
- Intelligent routing — queries are classified and sent to the optimal model
- Automatic fallbacks — provider failures trigger seamless failover
- Cost optimization — simple queries use cheap/fast models, complex ones use flagship
import { Orchestrator } from 'ai-orchestrator';
const orchestrator = new Orchestrator();
const result = await orchestrator.process({
query: "Compare microservices vs monolith architectures",
mode: 'standard',
});
console.log(result.answer); // Detailed comparison
console.log(result.route); // 'deep' (auto-detected complexity)
console.log(result.model); // 'claude-opus-4-...'
console.log(result.costs); // { estimated: 0.0045, inputTokens: 1200, ... }From production usage on my portfolio site:
| Metric | Before | After |
|---|---|---|
| Cost per query (avg) | $0.012 | $0.004 |
| Provider downtime impact | Full outage | Zero (fallback) |
| Simple query latency | 2.1s | 0.8s |
This isn't arbitrary vendor preference—it's an intentional architectural choice:
-
Quality-cost balance: Claude Sonnet is the primary workhorse. Best reasoning per dollar for typical queries. Not the cheapest, not the most expensive—the sweet spot.
-
Tiered complexity: Haiku handles fast/cheap auxiliary tasks (routing, reranking, classification). Opus is reserved for genuinely complex analysis. You don't bring a sledgehammer to hang a picture.
-
Ecosystem alignment: Building on Claude means access to extended thinking, tool use, and MCP compatibility as the ecosystem evolves.
-
Fallback resilience: GPT and Gemini act as safety nets, not primary choices. When Claude is unavailable, your app keeps working.
Query → Router (Haiku) → Route Selection → Execution → Response
↓
┌─────────────────────┼─────────────────────┐
↓ ↓ ↓
Fast Route Standard Route Deep Route
(GPT-4o-mini) (Claude Sonnet) (Claude Opus)
↓ ↓ ↓
Simple facts Synthesis Complex analysis
The router itself uses Claude Haiku—fast and cheap for classification. This adds ~100ms of latency but saves significant cost by avoiding Opus calls for simple questions.
npm install ai-orchestrator# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
# Optional: Fallback providers
export OPENAI_API_KEY="sk-..."
export GOOGLE_AI_API_KEY="..."import { Orchestrator } from 'ai-orchestrator';
const orchestrator = new Orchestrator();
// Automatic routing
const result = await orchestrator.process({
query: "What is TypeScript?",
});
// → Uses 'fast' route (GPT-4o-mini)
// Force a specific route
const deepResult = await orchestrator.processWithRoute('deep', {
query: "Analyze the trade-offs between SQL and NoSQL databases",
});
// → Uses 'deep' route (Claude Opus)The main entry point for query processing.
const orchestrator = new Orchestrator({
// Optional: Custom provider order
providers: {
providerOrder: ['anthropic', 'openai', 'google'],
},
// Optional: Custom router configuration
router: {
complexSignals: ['analyze', 'compare', 'explain why'],
},
});| Method | Description |
|---|---|
process(context) |
Route and execute a query automatically |
processWithRoute(route, context) |
Execute with a specific route |
getStats() |
Get routing statistics |
getAvailableProviders() |
List configured providers |
| Route | Primary Model | Use Case |
|---|---|---|
fast |
GPT-4o-mini | Simple factual questions |
standard |
Claude Sonnet | Balanced quality/speed |
deep |
Claude Opus | Complex analysis |
creative |
Claude Sonnet | Brainstorming, exploration |
research |
Gemini 1.5 Pro | Long context research |
Access model configurations directly:
import { getModel, estimateCost, getCheapestModel } from 'ai-orchestrator';
const sonnet = getModel('claude-sonnet-4');
console.log(sonnet.inputCost); // 3.00 per 1M tokens
const cost = estimateCost('claude-sonnet-4', 1000, 500);
console.log(cost); // Estimated cost in USD
const cheapest = getCheapestModel({ minContextWindow: 100000 });
console.log(cheapest.id); // 'gemini-1.5-flash'const orchestrator = new Orchestrator({
executor: {
systemPrompts: {
standard: `You are a helpful coding assistant...`,
deep: `You are an expert software architect...`,
},
},
});import { ProviderManager } from 'ai-orchestrator';
const providers = new ProviderManager({
providerOrder: ['anthropic', 'openai'],
});
const result = await providers.chat(
[{ role: 'user', content: 'Hello!' }],
'You are a helpful assistant.',
{ temperature: 0.7 }
);import { IntelligentRouter } from 'ai-orchestrator';
const router = new IntelligentRouter();
const decision = await router.route({
query: "What are the implications of quantum computing for cryptography?",
});
console.log(decision.route); // 'deep'
console.log(decision.confidence); // 0.92
console.log(decision.reasoning); // 'Query contains complexity signals'| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes* | Claude API key |
OPENAI_API_KEY |
No | GPT API key (fallback) |
GOOGLE_AI_API_KEY |
No | Gemini API key (fallback) |
*At least one provider key is required.
Override default models per provider:
const orchestrator = new Orchestrator({
providers: {
providerConfigs: {
anthropic: {
primaryModel: 'claude-3-5-sonnet-20241022',
fallbackModel: 'claude-3-haiku-20240307',
},
},
},
});ai-orchestrator/
├── src/
│ ├── index.ts # Main exports + Orchestrator class
│ ├── model-registry.ts # Model configs, pricing, routes
│ ├── providers.ts # Claude, GPT, Gemini providers
│ ├── router.ts # Intelligent query routing
│ └── executor.ts # Route execution framework
├── examples/
│ └── basic-usage.ts
├── docs/
│ └── architecture.md
└── README.md
This repo provides context-aware model dispatch — routing queries to the right model based on complexity, ensuring efficient use of context and compute.
| Layer | Role | This Repo |
|---|---|---|
| Intra-session | Short-term memory | — |
| Document-scoped | Injected content | — |
| Retrieved | Long-term semantic memory | — |
| Dispatch | Route queries to optimal model | ai-orchestrator |
Context continuity isn't just about what context to include — it's also about which model should process it. Simple queries don't need Opus-level reasoning. Complex analysis shouldn't be handled by a fast/cheap model. The orchestrator makes this decision automatically.
Related repos:
- rag-pipeline — Semantic retrieval for context
- mcp-rag-server — RAG as MCP tools
- chatbot-widget — Session cache, Research Mode, conversation export
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feat/add-new-provider) - Make changes with semantic commits
- Open a PR with clear description
MIT License - see LICENSE for details.
Built with Claude Code.
Co-Authored-By: Claude <[email protected]>