Skip to content

Latest commit

 

History

History
100 lines (69 loc) · 3.43 KB

File metadata and controls

100 lines (69 loc) · 3.43 KB

AI Router

Data-driven LLM model routing. Fetches cost, intelligence, and performance data from the Artificial Analysis API, computes the Pareto frontier, and generates tier-based routing profiles.

See ROUTER_SPEC.md for the runtime implementation spec (classifier, fallback logic, optimizations).

Setup

bun install

Get an API key from artificialanalysis.ai (free tier: 1,000 requests/day).

export AA_API_KEY=your_key_here

Usage

Quick start

bun run tracker:save

That's it. The tracker fetches model data, auto-calibrates tier thresholds from the cost/intelligence distribution, builds the model registry, generates all 5 routing profiles, and writes everything to config/.

How thresholds work

Tier thresholds are auto-calibrated from the data by default. The system uses percentiles of the cost and intelligence distributions to derive boundaries:

Tier Cost ceiling Intelligence floor
simple P35 cost P5 intelligence
medium P60 cost P20 intelligence
complex P85 cost P40 intelligence
reasoning P85 cost P40 intelligence + P30 math
premium uncapped P70 intelligence

Bands deliberately overlap so cost-effective models (e.g. MiniMax) appear across multiple tiers, creating a mixture. Within each tier, the profile strategy (cost-efficiency, maximize intelligence, etc.) determines the ranking.

As the model landscape shifts (new models, price changes, score updates), thresholds automatically adapt on each run.

After each run, config/MODELS.md shows the full breakdown of which models land in which tiers.

To override with manual thresholds, set calibration.method to "manual":

{
  "calibration": { "method": "manual" },
  "tiers": {
    "simple":    { "maxBlendedCost": 1.0,  "minIntelligence": 20 },
    "medium":    { "maxBlendedCost": 3.0,  "minIntelligence": 35 },
    "complex":   { "maxBlendedCost": 10.0, "minIntelligence": 50 },
    "reasoning": { "maxBlendedCost": 10.0, "minIntelligence": 50, "minMathIndex": 40 },
    "premium":   { "maxBlendedCost": null,  "minIntelligence": 60 }
  }
}

Commands

bun run tracker:save       # Fetch, calibrate, write config + changelog
bun run tracker:dry-run    # Fetch, calibrate, print report (no writes)
bun run tracker:explore    # Dump scatter table + Pareto frontier + auto thresholds

Flags

Flag Effect
--explore Dump scatter + Pareto frontier + auto thresholds, no writes
--dry-run Analyze + report, no writes (default)
--save Write config files + changelog
--no-cache Force fresh API fetch (skip 6h local cache)

API caching

Responses are cached locally at config/.aa-cache.json with a 6-hour TTL. Repeated runs within that window don't hit the API. Use --no-cache to bypass.

Development

bun test            # run tests
bun run typecheck   # tsc --noEmit
bun run lint        # biome check
bun run lint:fix    # biome check --write

Automation

.github/workflows/tracker.yml runs the tracker weekly (Monday 9am UTC). On config changes it commits to a branch and opens a PR for review.

Requires AA_API_KEY set as a repository secret.

Data attribution

Model benchmarks, pricing, and performance data provided by Artificial Analysis.