Conduit

Smart LLM routing that learns to cut costs without sacrificing quality.

Smart routing that learns which LLM to use for each type of query. Like a chess engine that improves with every game, Conduit gets better at picking the right model as it sees more queries. Unlike static routers with fixed rules, Conduit adapts to your workload automatically.

The Problem

You have 5+ LLM options (GPT-4, Claude Opus, Gemini, etc.) with 100x price differences. Most teams either:

Overspend: Route everything to expensive models "to be safe"
Underspend: Route everything to cheap models and get poor quality
Manual rules: Write brittle if/else logic that breaks when models update

There's a better way.

Why Conduit?

Feature	Conduit	Static Routers	Manual Selection
Cost Optimization	✅ Learns cheapest model per query type	❌ Fixed rules, no learning	❌ Expensive models everywhere
Quality	✅ Balances cost vs quality via feedback	⚠️ Depends on rules	✅ High but wasteful
Adapts to Changes	✅ Learns when models improve/degrade	❌ Manual rule updates needed	❌ Manual switching
Setup Time	✅ 5 lines of code	⚠️ Complex rule configuration	✅ Simple but inefficient
Learning	✅ Continuous adaptation	❌ No learning	❌ No learning

Quick Start

Try it without API keys first:

python examples/hybrid_routing.py

This demo shows Conduit's learning in action with simulated LLM responses. No setup required.

With real LLMs (5 lines):

# Just 5 lines to start saving money
import asyncio
from conduit import Router, Query

async def main():
    router = Router()  # Auto-detects models from your API keys
    decision = await router.route(Query(text="What is 2+2?"))
    print(f"Route to: {decision.selected_model} (confidence: {decision.confidence:.0%})")

asyncio.run(main())

Starts learning immediately: Conduit begins routing from query 1 using Thompson Sampling (Bayesian exploration). Optional contextual algorithms (LinUCB) available via algorithm parameter.

How It Works

Think of Conduit as a recommendation system for LLMs. It learns from every query which models work best for which tasks.

Query → Analyze → Smart Selection → LLM Provider → Response
   ↓                                                    ↓
   └──────────── Learn & Improve ─────────────────────┘

Analyze: Understand query complexity, domain, and context
Smart Selection: Pick optimal model balancing cost, quality, and speed
Execute: Route to selected model via PydanticAI or LiteLLM
Learn: Track what worked and improve future routing decisions

Under the hood: Uses Thompson Sampling bandit algorithm for multi-armed optimization. Thompson Sampling provides superior cold-start quality through Bayesian exploration (arXiv 2510.02850). 12 algorithms available via algorithm parameter (see docs/BANDIT_ALGORITHMS.md).

How Learning Works

This is Conduit's superpower: It actually learns from every query to make better routing decisions over time. Not just AB testing, not just static rules - real online learning that adapts to your workload.

The Feedback Loop (Concrete Example)

Imagine you're routing math questions. Here's what happens:

Week 1: Initial Exploration

router = Router(models=["gpt-4o-mini", "gpt-4o"])

# Route 100 math questions
for _ in range(100):
    decision = await router.route(Query(text="What is 2+2?"))
    # Execute and provide feedback
    await router.update(
        model_id=decision.selected_model,
        cost=response.cost,        # Actual cost
        quality_score=0.95,        # How good was the answer?
        latency=response.latency,  # How fast?
        features=decision.features # Query characteristics
    )

What Conduit learns:

gpt-4o-mini: $0.001/query, quality 0.95, latency 0.5s → Reward: 0.93
gpt-4o: $0.10/query, quality 0.95, latency 0.8s → Reward: 0.90

Week 2: Smarter Routing

# Route 100 more math questions
# Conduit now prefers gpt-4o-mini (>70% selection rate)
# Same quality, 100x cheaper - learned automatically!

Why This Matters

Traditional routers would need manual rules like "if query contains 'math', use cheap model". Conduit learns this automatically from feedback:

Approach	Setup	Adapts to Changes	Example
Conduit	Zero config	✅ Automatic	Learns "gpt-4o-mini good for math" from feedback
Static Rules	Write if/else rules	❌ Manual updates	`if "math" in query: use_mini` breaks when models update
Manual	Pick per query	❌ No learning	Waste time deciding, forget cheap options

The Math (for the curious)

Conduit defaults to Thompson Sampling, a Bayesian bandit algorithm that:

Models uncertainty for each LLM using Beta distributions
Samples from each model's distribution to select
Updates beliefs after every query based on success/failure

Reward formula: 0.7 × quality + 0.2 × cost_efficiency + 0.1 × speed

This means:

Cheap models with same quality get higher rewards (cost-efficient)
Fast models get bonus points (latency matters)
Quality always dominates (70% weight)

How Thompson Sampling works:

# Each model has a Beta(α, β) distribution
# α = successes + 1, β = failures + 1

# Selection: sample from each, pick highest
samples = {model: Beta(α[model], β[model]).sample() for model in models}
selected = max(samples, key=samples.get)

# Update: reward >= 0.85 is success
if reward >= 0.85:
    α[selected] += 1  # More confident model is good
else:
    β[selected] += 1  # More confident model is bad

Why Thompson Sampling? Best cold-start performance via Bayesian exploration (arXiv 2510.02850).

For contextual routing (different models for different query types), use algorithm="linucb". See docs/BANDIT_ALGORITHMS.md for all options.

Verified Behavior

Our integration tests prove the feedback loop works:

# test_feedback_loop_improves_routing
# Result: After 40 training queries, cheap model selected >70% of time
# Proves: System learned cost-efficiency automatically

See tests/integration/test_feedback_loop.py and docs/ARCHITECTURE.md for technical details.

Key Features

Learning System: Gets smarter with every query, learns your workload patterns automatically
Multi-Objective: Balance cost, quality, and speed based on what matters to you
100+ LLM Providers: Direct support for 8 providers, extended via LiteLLM integration
Smart Caching: Faster embedding lookups on repeated queries (optional Redis)
Per-Query Control: Override optimization per query (quality-first, cost-first, speed-first)
Zero Config: Auto-detects available models from your API keys

Installation

Quick Install

# Clone and install
git clone https://github.com/ashita-ai/conduit.git
cd conduit && source .venv/bin/activate || python3.13 -m venv .venv && source .venv/bin/activate
pip install -e .

# Set API keys (at least one required)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-...

# Run example
python examples/hello_world.py

Full Setup

Prerequisites:

Python 3.10+ (3.13 recommended)
At least one LLM API key (OpenAI, Anthropic, Google, Groq, Mistral, Cohere, AWS Bedrock, HuggingFace)
Optional: Redis (caching), PostgreSQL (history)

Configuration (.env):

# LLM Providers (at least one)
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key

# Embedding (optional - defaults to free HuggingFace API)
EMBEDDING_PROVIDER=huggingface  # Options: huggingface, openai, cohere, sentence-transformers

# Optional
DATABASE_URL=postgresql://postgres:password@localhost:5432/conduit
REDIS_URL=redis://localhost:6379
LOG_LEVEL=INFO

Database setup (optional):

./migrate.sh  # or: psql $DATABASE_URL < migrations/001_initial_schema.sql

Using with LiteLLM

Access 100+ LLM providers through LiteLLM with Conduit's ML-powered routing:

from litellm import Router
from conduit_litellm import ConduitRoutingStrategy

# Configure LiteLLM with your models
router = Router(
    model_list=[
        {"model_name": "gpt-4", "litellm_params": {"model": "gpt-4o-mini"}},
        {"model_name": "claude-3", "litellm_params": {"model": "claude-3-haiku-20240307"}},
    ]
)

# Add Conduit's ML routing (replaces rule-based routing)
strategy = ConduitRoutingStrategy(use_hybrid=True)
ConduitRoutingStrategy.setup_strategy(router, strategy)

# LiteLLM now uses Conduit's learning-based model selection
response = await router.acompletion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Conduit learns which model performs best for each query type, automatically optimizing cost while maintaining quality. See docs/LITELLM_INTEGRATION.md for full configuration options.

User Preferences

Control optimization per query with 4 presets:

from conduit.core import Query, UserPreferences

# Optimize for minimum cost
query = Query(
    text="Simple calculation",
    preferences=UserPreferences(optimize_for="cost")
)

# Presets:
# - balanced: Default (70% quality, 20% cost, 10% latency)
# - quality:  Maximize quality (80% quality, 10% cost, 10% latency)
# - cost:     Minimize cost (40% quality, 50% cost, 10% latency)
# - speed:    Minimize latency (40% quality, 10% cost, 50% latency)

Customize weights in conduit.yaml - see examples/routing_options.py.

Examples

examples/
├── hello_world.py           # 5-line minimal example
├── routing_options.py       # Constraints, preferences, algorithms
├── feedback_loop.py         # Caching, learning, state persistence
├── production_feedback.py   # Explicit feedback (thumbs, ratings)
├── litellm_integration.py   # 100+ providers via LiteLLM
├── hybrid_routing.py        # UCB1 to LinUCB transition (no API needed)
└── integrations/            # LangChain, LlamaIndex, FastAPI, Gradio

Documentation

Architecture: docs/ARCHITECTURE.md - System design and components
Algorithms: docs/BANDIT_ALGORITHMS.md - ML algorithm details
Hybrid Routing: docs/HYBRID_ROUTING_ALGORITHMS.md - 4 configurable algorithm combinations, benchmarks, and state conversion
Fallback Chains: docs/FALLBACK.md - Automatic model fallback for production resilience
Embeddings: docs/EMBEDDING_PROVIDERS.md - Embedding configuration
Troubleshooting: docs/TROUBLESHOOTING.md - Common issues and solutions
LiteLLM: docs/LITELLM_INTEGRATION.md - Extended provider support
Development: AGENTS.md - Development guidelines and contribution guide
Strategic Decisions: notes/2025-11-18_business_panel_analysis.md

Comparison with Routing Alternatives

Solution	Best For	Learning	Provider Support	Setup Complexity
Conduit	ML-based cost optimization	✅ Continuous ML	100+ via LiteLLM/PydanticAI	Low (5 lines)
RouteLLM	Research/custom ML routing	✅ Requires training data	Custom integration	High
Martian	Simple static rules	❌ Fixed rules	Limited	Low
Manual if/else	Full control, low volume	❌ No learning	Any	None

Not Competitors, We Integrate:

LiteLLM: Provider abstraction layer. Conduit uses LiteLLM to access 100+ providers, adds ML routing on top.
Portkey: Enterprise gateway (teams, SSO, observability). Could run Conduit behind Portkey for ML routing + enterprise features.
LangChain: LLM orchestration framework. Conduit integrates as a routing component (see examples/integrations/).

Think of it this way: LiteLLM/Portkey are the roads, Conduit is the GPS that picks the best route.

When Conduit shines:

You're spending $500+/month on LLM APIs (enough volume to benefit)
You have 1000+ queries/day with varying complexity
You use 3+ different models across your workload
You want automatic optimization without writing routing rules

When NOT to use Conduit (be honest with yourself):

Single model: If you only use one model, you don't need routing
Low volume: <100 queries/day? Simple round-robin is fine
Fixed patterns: If 100% of queries need the same model, no router helps
Need streaming: Conduit doesn't support streaming responses yet (post-1.0)
Enterprise compliance: Need SOC2, teams, SSO? Use Portkey instead

When to choose routing alternatives:

You have research/custom ML models and training data (→ RouteLLM)
You have simple static rules that never change (→ Martian)
You want to write all routing logic yourself (→ Manual if/else)

Tech Stack

Core: Python 3.10+, PydanticAI 1.14+, FastAPI
ML: NumPy 2.0+ (LinUCB matrices), contextual bandits
Storage: PostgreSQL (history), Redis (caching)
Embeddings: HuggingFace API (free default), OpenAI, Cohere, sentence-transformers

Status

Version: 0.1.0 (Pre-1.0) Test Status: 100% passing (1000+ tests), 91% coverage CI/CD: GitHub Actions with automated testing

Recent Additions

✅ Production feedback system (pluggable adapters, idempotent, session-aware)
✅ Thompson Sampling default (superior cold-start quality)
✅ Lightweight API-based embeddings (HuggingFace default, no heavy dependencies)
✅ Arbiter LLM-as-Judge (automatic quality evaluation)
✅ LiteLLM feedback loop (zero-config learning)
✅ 12 configurable algorithms (Thompson, LinUCB, UCB1, baselines, hybrid)
✅ Confidence-weighted learning (partial observations, calibrated rewards)
✅ Multi-objective rewards (quality + cost + latency)
✅ User preferences (per-query optimization control)
✅ Dynamic pricing (71+ models auto-updated)

Frequently Asked Questions

How much can I actually save?

Savings depend on your query mix. If you route expensive queries (GPT-4, Claude Opus) to cheaper models (GPT-4o-mini, Claude Haiku) when quality allows, you can reduce costs significantly. Actual savings depend on your query distribution and quality requirements.

Example: If 60% of your queries can use cheap models without quality loss and 40% require expensive models, you'll save compared to using expensive models everywhere. Exact percentages require benchmarking on your workload.

How does Conduit balance cost and quality?

Conduit learns which models work best for different query types through continuous feedback. The system optimizes a multi-objective reward balancing quality, cost, and latency.

Feedback system:

Explicit feedback (thumbs, ratings, task success) via pluggable adapters
Implicit feedback (errors, retries, latency) weighted at 30%
Confidence-weighted updates for partial observations
Idempotent recording (safe retries, no double-counting)
Session-level feedback propagation for multi-turn conversations
Composite rewards balancing quality (70%), cost (20%), latency (10%) by default

You can adjust these weights per query using UserPreferences to prioritize quality over cost or vice versa.

How long until I see savings?

Immediate: Conduit starts routing from query 1 using Thompson Sampling (Bayesian exploration for quality-first cold start)
2,000 queries: Switches to LinUCB (contextual, query-aware decisions)
Continuous: Adapts to model updates, pricing changes, new models as they're added

Quality-first cold start: The default Thompson Sampling algorithm uses Bayesian exploration to achieve better model selection during the learning phase compared to simpler exploration strategies.

What if I don't have Redis or PostgreSQL?

Conduit works without them - they're optional optimizations:

No Redis: Repeated queries require re-computing embeddings (slower), but routing still works
No PostgreSQL: No history persistence, but in-memory routing works fine

For production, Redis recommended for embedding caching.

Can I use my own quality evaluation?

Yes. Provide explicit feedback via router.update():

# After executing a query
decision = await router.route(Query(text="What is 2+2?"))
response = await execute_llm_call(decision.selected_model, "What is 2+2?")

# Provide your quality evaluation
await router.update(
    model_id=decision.selected_model,
    cost=response.cost,
    quality_score=0.95,  # Your evaluation (0.0-1.0)
    latency=response.latency,
    features=decision.features,
)

Or use the built-in Arbiter (LLM-as-judge) for automatic evaluation.

Is this production-ready?

Pre-1.0 status: Core routing is stable and tested (1000+ tests passing, 91% coverage), but:

APIs may change before 1.0
Some features are experimental (PCA reduction, Arbiter evaluation)
Production use recommended with monitoring and fallbacks

How does this compare to prompt optimization?

Complementary approaches:

Prompt optimization: Get better results from the same model
Conduit: Route queries to the right model for the task

Use both. Conduit's savings are independent of prompt quality.

What about streaming responses?

Not currently supported. Conduit focuses on request/response routing. Streaming support planned for post-1.0.

Development

# Tests
pytest --cov=conduit  # Run with coverage (must be >80%)

# Code quality (must pass before commit)
mypy conduit/         # Type checking
ruff check conduit/   # Linting
black conduit/        # Formatting

# Install git hooks (runs tests before push)
bash scripts/install-hooks.sh

Security

Automated Dependency Scanning: GitHub Dependabot monitors dependencies weekly for security vulnerabilities. See .github/dependabot.yml and repository Security tab.

Never commit credentials: Use environment variables in .env (gitignored). See AGENTS.md for security practices.

License

MIT License - see LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.github		.github
assets		assets
conduit		conduit
conduit_litellm		conduit_litellm
docker		docker
docs		docs
examples		examples
migrations		migrations
notes		notes
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CURSOR.md		CURSOR.md
Dockerfile		Dockerfile
Dockerfile.router		Dockerfile.router
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
alembic.ini		alembic.ini
conduit.yaml		conduit.yaml
docker-compose.openwebui.yml		docker-compose.openwebui.yml
docker-compose.yml		docker-compose.yml
migrate.sh		migrate.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

ashita-ai/conduit

Folders and files

Latest commit

History

Repository files navigation

Conduit

The Problem

Why Conduit?

Quick Start

How It Works

How Learning Works

The Feedback Loop (Concrete Example)

Why This Matters

The Math (for the curious)

Verified Behavior

Key Features

Installation

Quick Install

Full Setup

Using with LiteLLM

User Preferences

Examples

Documentation

Comparison with Routing Alternatives

Tech Stack

Status

Recent Additions

Frequently Asked Questions

How much can I actually save?

How does Conduit balance cost and quality?

How long until I see savings?

What if I don't have Redis or PostgreSQL?

Can I use my own quality evaluation?

Is this production-ready?

How does this compare to prompt optimization?

What about streaming responses?

Development

Security

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages