REST API Reference

Nexus exposes an OpenAI-compatible API gateway that unifies local and cloud LLM backends behind a single endpoint. All responses follow the OpenAI format — Nexus-specific metadata is conveyed exclusively through X-Nexus-* headers.

For setup and configuration, see the Getting Started guide.

Quick Reference

Method	Path	Description
`POST`	`/v1/chat/completions`	Chat completion (streaming and non-streaming)
`POST`	`/v1/embeddings`	Generate text embeddings
`GET`	`/v1/models`	List available models from healthy backends
`POST`	`/v1/models/load`	Load model on specific backend (lifecycle API)
`DELETE`	`/v1/models/{id}`	Unload model from specific backend (lifecycle API)
`POST`	`/v1/models/migrate`	Migrate model between backends (lifecycle API)
`GET`	`/v1/fleet/recommendations`	Fleet intelligence recommendations (lifecycle API)
`GET`	`/health`	System health with backend/model counts
`GET`	`/v1/stats`	JSON stats: uptime, request counts, per-backend metrics
`GET`	`/metrics`	Prometheus text format metrics
`GET`	`/`	Web dashboard (embedded, real-time via WebSocket)

Endpoints

POST `/v1/chat/completions`

OpenAI-compatible chat completion endpoint. Supports both streaming and non-streaming responses.

Request:

{
  "model": "llama3:70b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 1000
}

Field	Type	Required	Description
`model`	string	Yes	Model identifier (supports aliases)
`messages`	array	Yes	Conversation messages (`system`, `user`, `assistant`)
`stream`	boolean	No	Enable Server-Sent Events streaming (default: `false`)
`temperature`	number	No	Sampling temperature (0.0–2.0)
`max_tokens`	integer	No	Maximum tokens to generate

Response (non-streaming):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "llama3:70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Response (streaming):

When stream: true, the response uses Server-Sent Events (SSE). Each event is a data: line containing a JSON chunk, terminated by data: [DONE]:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":"stop"}]}

data: [DONE]

POST `/v1/embeddings`

OpenAI-compatible embeddings endpoint. Generates vector representations of text input. Works with Ollama and OpenAI backends that support embedding models.

Request:

{
  "model": "nomic-embed-text",
  "input": "The quick brown fox jumps over the lazy dog"
}

The input field accepts a single string or an array of strings for batch embedding:

{
  "model": "nomic-embed-text",
  "input": ["First document", "Second document", "Third document"]
}

Field	Type	Required	Description
`model`	string	Yes	Embedding model identifier
`input`	string \| string[]	Yes	Text to embed — single string or array of strings
`encoding_format`	string	No	Encoding format (e.g., `"float"`)

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0023, -0.0091, 0.0152, ...],
      "index": 0
    }
  ],
  "model": "nomic-embed-text",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}

Field	Type	Description
`object`	string	Always `"list"`
`data`	array	Array of embedding objects
`data[].object`	string	Always `"embedding"`
`data[].embedding`	float[]	Vector representation of the input
`data[].index`	integer	Index corresponding to the input position
`model`	string	Model used to generate the embeddings
`usage.prompt_tokens`	integer	Number of tokens in the input
`usage.total_tokens`	integer	Total tokens processed

Supported backends: Ollama (e.g., nomic-embed-text, all-minilm), OpenAI (e.g., text-embedding-3-small, text-embedding-ada-002).

Error responses:

400 — Empty input or invalid request format
404 — Model not found on any backend
502 — Backend agent not registered or agent error
503 — No healthy backend with embeddings support available

GET `/v1/models`

Lists all available models from healthy backends. Each entry corresponds to a specific model on a specific backend.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "llama3:70b",
      "object": "model",
      "created": 1700000000,
      "owned_by": "backend-name"
    }
  ]
}

GET `/health`

System health check with backend and model counts.

Response:

{
  "status": "healthy",
  "version": "0.4.0",
  "uptime_seconds": 3600,
  "backends": { "total": 3, "healthy": 2, "unhealthy": 1 },
  "models": { "total": 5 }
}

GET `/v1/stats`

JSON stats endpoint for dashboards and debugging. Returns uptime, per-backend request counts, latency, and pending request depth.

Example response fields:

uptime_seconds — time since Nexus started
total_requests — aggregate request count
backends[] — per-backend stats including request count, average latency, and pending depth

GET `/metrics`

Prometheus text format metrics. Configure your Prometheus scraper to target:

http://<nexus-host>:8000/metrics

Exported metrics include:

Request counters and duration histograms
Error rates
Backend latency
Token usage
Fleet state gauges
Reconciler pipeline timing

GET `/`

Embedded web dashboard (HTML/JS/CSS) with real-time monitoring via WebSocket. See the WebSocket Protocol documentation for details on the real-time update format.

Nexus-Transparent Protocol Headers

Nexus adds X-Nexus-* response headers to expose routing decisions without modifying the OpenAI-compatible JSON body. This keeps Nexus fully transparent to existing OpenAI client libraries.

Response Headers

Header	Description	Example
`X-Nexus-Backend`	Backend that handled the request	`local-ollama`
`X-Nexus-Backend-Type`	`local` or `cloud`	`local`
`X-Nexus-Route-Reason`	Why this backend was chosen	`capability-match`
`X-Nexus-Cost-Estimated`	Estimated cost in USD (cloud only)	`0.0023`
`X-Nexus-Privacy-Zone`	Privacy zone of the backend	`restricted`
`X-Nexus-Fallback-Model`	Model used if fallback occurred	`gpt-3.5-turbo`
`X-Nexus-Rejection-Reasons`	Why backends were excluded (on 503)	`privacy_zone_mismatch`
`X-Nexus-Rejection-Details`	Detailed rejection context (on 503)	JSON details

Request Headers

Header	Description
`X-Nexus-Strict`	Enforce same-or-higher capability tier (default behavior)
`X-Nexus-Flexible`	Allow higher-tier substitution when the exact tier is unavailable
`X-Nexus-Priority`	Queue priority: `high` or `normal` (default: `normal`). When all capable backends are at capacity and request queuing is enabled, high-priority requests are dequeued before normal-priority requests. Invalid values default to `normal`.

Actionable Error Responses

When no backend can serve a request, Nexus returns HTTP 503 with actionable context instead of a generic error. This follows the project principle of honest failures over silent quality downgrades.

{
  "error": {
    "message": "No backend available for model 'gpt-4' with required capabilities",
    "type": "service_unavailable",
    "code": "no_available_backend",
    "context": {
      "required_tier": 4,
      "available_backends": ["ollama-local"],
      "privacy_zone_required": "restricted",
      "eta_seconds": null
    }
  }
}

The context object provides enough information for clients to take corrective action — for example, relaxing privacy constraints or falling back to a different model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API Reference

Quick Reference

Endpoints

POST `/v1/chat/completions`

POST `/v1/embeddings`

GET `/v1/models`

GET `/health`

GET `/v1/stats`

GET `/metrics`

GET `/`

Nexus-Transparent Protocol Headers

Response Headers

Request Headers

Actionable Error Responses

FilesExpand file tree

rest.md

Latest commit

History

rest.md

File metadata and controls

REST API Reference

Quick Reference

Endpoints

POST /v1/chat/completions

POST /v1/embeddings

GET /v1/models

GET /health

GET /v1/stats

GET /metrics

GET /

Nexus-Transparent Protocol Headers

Response Headers

Request Headers

Actionable Error Responses

POST `/v1/chat/completions`

POST `/v1/embeddings`

GET `/v1/models`

GET `/health`

GET `/v1/stats`

GET `/metrics`

GET `/`