Roadmap

One API endpoint. Any backend. Local first, cloud when needed. Zero configuration.

Nexus is a control plane for heterogeneous LLM inference — it routes requests and enforces policies; backends handle the heavy lifting.

Product Roadmap

Version	Theme	Features	Status
v0.1	Foundation	Registry, Health, Router, mDNS, CLI, Aliases, Fallbacks	✅ Released
v0.2	Observability	Prometheus metrics, Web Dashboard, Structured logging	✅ Released
v0.3	Cloud Hybrid	Cloud backends, Privacy zones, Capability tiers, Budget management	✅ Released
v0.4	Intelligence	Speculative router, Quality tracking, Embeddings, Queuing	✅ Released
v0.5	Orchestration	Pre-warming, Model lifecycle, Multi-tenant, Rate limiting	Planned
v1.0	Complete Product	Management UI — full web-based control plane	Planned

Feature Index

ID	Feature	Version	Status	Spec
F01	Core API Gateway	v0.1	✅ Complete	specs/004-api-gateway
F02	Backend Registry	v0.1	✅ Complete	specs/001-backend-registry
F03	Health Checker	v0.1	✅ Complete	specs/002-health-checker
F04	CLI and Configuration	v0.1	✅ Complete	specs/003-cli-configuration
F05	mDNS Discovery	v0.1	✅ Complete	specs/005-mdns-discovery
F06	Intelligent Router	v0.1	✅ Complete	specs/006-intelligent-router
F07	Model Aliases	v0.1	✅ Complete	specs/007-model-aliases
F08	Fallback Chains	v0.1	✅ Complete	specs/008-fallback-chains
F09	Request Metrics	v0.2	✅ Complete	specs/009-request-metrics
F10	Web Dashboard	v0.2	✅ Complete	specs/010-web-dashboard
F11	Structured Request Logging	v0.2	✅ Complete	specs/011-structured-logging
—	NII Extraction (Phase 1)	v0.3	✅ Complete	specs/012-nii-extraction
F12	Cloud Backend Support	v0.3	✅ Complete	specs/013-cloud-backend-support
—	Control Plane (Phase 2)	v0.3	✅ Complete	specs/014-control-plane-reconciler
F13	Privacy Zones & Capability Tiers	v0.3	✅ Complete	specs/015-privacy-zones-capability-tiers
F14	Inference Budget Management	v0.3	✅ Complete	specs/016-inference-budget-mgmt
—	Quality + Queuing (Phase 2.5)	v0.4	✅ Complete	specs/017-quality-tracking-embeddings-queuing
F15	Speculative Router	v0.4	✅ Complete	specs/018-speculative-router
F16	Quality Tracking & Backend Profiling	v0.4	✅ Complete	specs/019-quality-tracking
F17	Embeddings API	v0.4	✅ Complete	specs/020-embeddings-api
F18	Request Queuing & Prioritization	v0.4	✅ Complete	specs/021-request-queuing
—	Fleet Intelligence (Phase 3)	v0.5	✅ Complete	specs/022-fleet-intelligence-model-lifecycle
F19	Pre-warming & Fleet Intelligence	v0.5	✅ Complete	specs/022-fleet-intelligence-model-lifecycle
F20	Model Lifecycle Management	v0.5	✅ Complete	specs/022-fleet-intelligence-model-lifecycle
F21	Multi-Tenant Support	v0.5	Planned	-
F22	Rate Limiting	v0.5	Planned	-
F23	Management UI	v1.0	Planned	-

Version Details

v0.1 — Foundation

The initial release established Nexus as a working LLM gateway. Core infrastructure includes an in-memory backend registry with concurrent access (DashMap), background health checking with configurable intervals, and mDNS auto-discovery of Ollama and exo backends. The intelligent router scores backends by capability match, load, and latency using an exponential moving average. Model aliases allow friendly names (e.g., fast → mistral:7b) with up to 3-level chaining, and fallback chains provide automatic failover when a preferred model is unavailable. A full CLI (clap) supports serve, backends, models, health, config, and completions subcommands.

v0.2 — Observability

Added production-grade observability. Prometheus metrics (/metrics) expose request counters, latency histograms, and backend gauge metrics. A real-time web dashboard is embedded in the binary (rust-embed) with WebSocket-based live updates for backend status, model changes, and request history. Structured request logging provides per-request trace context with configurable output format (JSON or pretty-print) and per-component log levels.

v0.3 — Cloud Hybrid

Extended Nexus from a local-only gateway to a hybrid local+cloud control plane. Cloud backend support (OpenAI, generic OpenAI-compatible) adds API key management and tiktoken-based token counting. Privacy zones enforce structural guarantees — backends are tagged as local, restricted, or public, and requests with sensitive data never route to public backends. Capability tiers ensure overflow only targets same-or-higher tier backends. Inference budget management tracks token spend with configurable limits and graceful degradation policies.

v0.4 (Phase 2.5) — Quality, Embeddings & Queuing

Added intelligence features that make Nexus smarter about backend quality and request handling. The speculative router (specs/018-speculative-router) pre-analyzes request patterns and model history to predict the optimal backend before full payload inspection. Quality tracking (specs/019-quality-tracking) profiles backend response quality — tracking success rates, error rates, and latency per model to inform routing decisions. The Embeddings API (specs/020-embeddings-api) adds /v1/embeddings support with capability-aware routing to Ollama and OpenAI backends. Request queuing (specs/021-request-queuing) holds requests when all capable backends are at capacity instead of rejecting immediately, with priority support via the X-Nexus-Priority header.

What's Next (v0.5 — Orchestration)

The next phase focuses on fleet-level intelligence and operational control:

~~Pre-warming & Fleet Intelligence (F19)~~ ✅ Predictive model loading based on usage patterns
~~Model Lifecycle Management (F20)~~ ✅ Load/unload/migrate models via API
Multi-Tenant Support (F21) — API keys and per-tenant quotas
Rate Limiting (F22) — Per-backend and per-tenant protection

Non-Goals

These remain permanently out of scope:

Distributed inference — That's exo's job; Nexus routes to backends that handle this
Model serving — Nexus doesn't run models, it routes to servers that do
Model management — No model downloads, conversions, or storage
GPU scheduling — Backends manage their own resources
Training/fine-tuning — Inference only
Stateful session management — Clients own conversation history (OpenAI API contract)

Architectural Principles

Zero Configuration — mDNS discovery, sensible defaults, "just run it"
Single Binary — Rust, no runtime dependencies
OpenAI-Compatible — Strict API adherence; metadata in X-Nexus-* headers only
Stateless by Design — Route requests, not sessions; clients own conversation history
Explicit Contracts — 503 with actionable context over silent quality downgrades
Precise Measurement — Per-backend tokenizer registry, VRAM-aware fleet management

See architecture.md for full technical details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap

Product Roadmap

Feature Index

Version Details

v0.1 — Foundation

v0.2 — Observability

v0.3 — Cloud Hybrid

v0.4 (Phase 2.5) — Quality, Embeddings & Queuing

What's Next (v0.5 — Orchestration)

Non-Goals

Architectural Principles

FilesExpand file tree

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

Roadmap

Product Roadmap

Feature Index

Version Details

v0.1 — Foundation

v0.2 — Observability

v0.3 — Cloud Hybrid

v0.4 (Phase 2.5) — Quality, Embeddings & Queuing

What's Next (v0.5 — Orchestration)

Non-Goals

Architectural Principles