System Design

This reference describes how the Jan Server components fit together. Use it when reviewing cross-service changes or planning deployments.

1. System Overview

Jan Server is a microservices platform that exposes OpenAI-compatible APIs through Kong. Each service owns a focused domain:

LLM API (8080) - chat completions, conversations, projects, model catalog.
Response API (8082) - multi-step orchestration and MCP tool coordination.
Media API (8285) - binary ingestion, jan_* IDs, presigned URL management.
MCP Tools (8091) - JSON-RPC endpoint that proxies Serper/SearXNG search, scraping, file search, and SandboxFusion execution.
Memory Tools (8090) - semantic memory using BGE-M3 embeddings with caching and batch processing.
Realtime API (8186) - WebRTC session management via LiveKit for real-time audio/video communication.
Shared infrastructure - Kong (8000), Keycloak (8085), PostgreSQL, vLLM (8101), observability stack.

Note: Template API (8185) is a development scaffold and not part of the deployed stack.

Kong terminates TLS (in production), validates JWT/API keys, applies rate limits, and forwards requests to the internal services.

2. Architecture Layers

Layer	Components	Notes
Edge	Kong Gateway, Keycloak	Centralized auth, rate limiting, guest-token endpoint.
Application	LLM API, Response API, Media API, MCP Tools, Memory Tools, Realtime API	Written in Go using Gin + zerolog, configured via `pkg/config`.
Tooling	SearXNG, Serper, SandboxFusion, vector-store	Only accessible from MCP Tools.
Data/Storage	PostgreSQL (`api-db`, `keycloak-db`), S3-compatible storage	Media files live in object storage; metadata lives in PostgreSQL.
Inference	vLLM (local) or remote OpenAI-compatible providers	Selected per request using the provider metadata catalog.
Observability	OpenTelemetry Collector, Prometheus, Grafana, Jaeger	Enabled with `OTEL_ENABLED=true` + `make monitor-up`.

3. Component Diagram

             +------------------------------+
             |  External Clients / SDKs     |
             +---------------+--------------+
                             |
                             v
                   +-------------------+
                   |   Kong Gateway    | 8000
                   +---+---+----+------+
                       |   |    |
        +--------------+   |    +----------------+
        |                  |                     |
        v                  v                     v
  +-----------+    +---------------+      +---------------+
  |  LLM API  |    |  Response API |      |   Media API   |
  | (8080)    |    |    (8082)     |      |    (8285)     |
  +-----+-----+    +-------+-------+      +-------+-------+
        |                  |                     |
        |                  v                     |
        |        +-------------------+           |
        +------->|    MCP Tools      |<----------+
        |        |     (8091)        |
        |        +----+---+----+-----+
        |             |   |    |
        |             |   |    +--> SandboxFusion
        |             |   +-------> Vector Store
        |             +-----------> SearXNG / Serper
        |
        v
  +---------------+      +----------------+
  | Memory Tools  |      | Realtime API   |
  |   (8090)      |      |    (8186)      |
  +---------------+      +----------------+

Shared dependencies (not shown): PostgreSQL (api-db), S3/Object storage, Keycloak (JWT issuer), vLLM (8101), BGE-M3 (embeddings), LiveKit (WebRTC).

4. Request Lifecycles

Chat Completions

Client calls POST /v1/chat/completions on http://localhost:8000.
Kong validates the JWT/API key and forwards to llm-api:8080.
LLM API resolves jan_* placeholders via Media API, selects a provider (local vLLM or remote), and streams tokens back to the gateway.
Conversations/projects are persisted in PostgreSQL.

Response Orchestration

Client calls POST /responses/v1/responses (streaming optional).
Response API loads the conversation context and iteratively issues tools/list / tools/call requests to MCP Tools.
Tool executions are capped by RESPONSE_MAX_TOOL_DEPTH and TOOL_EXECUTION_TIMEOUT.
Final synthesis is delegated to LLM API and streamed back to the caller.

Media Handling

Upload via POST /media/v1/media (remote URL or data URL) or request a presigned upload with POST /media/v1/media/prepare-upload.
Media API deduplicates content, issues a jan_* ID, and stores metadata in PostgreSQL.
Other services embed the jan_* ID; LLM API resolves them to presigned URLs right before inference.

MCP JSON-RPC

Response API or external automation sends JSON-RPC requests to POST /v1/mcp.
MCP Tools validates the method (tools/list, tools/call, prompts/*, resources/*) and dispatches to the Serper/SearXNG/SandboxFusion clients.
Results are returned as SSE events (streaming) or plain JSON when the response fits a single chunk.

5. Data & Network Topology

Docker Compose defines two primary networks: jan-server_default (Kong + core services + databases) and jan-server_mcp-network (MCP-only helpers such as SearXNG, vector store, SandboxFusion).
Production deployments should mirror this split using Kubernetes namespaces or NetworkPolicies.
Persistent data:
- api-db (LLM/Response/Media metadata) - each service uses its own schema.
- keycloak-db - Keycloak realm and client configuration.
- Object storage (S3, MinIO, etc.) - Media files and presigned URLs.

6. Deployment Modes

Mode	Description	Commands
Local (recommended)	`make quickstart` prompts for providers, writes `.env`, and runs `docker compose up` with all services.	`make quickstart`
Profiles	Start a subset of services (API only, MCP only, GPU inference).	`make up-api`, `make up-mcp`, `make up-gpu`
Monitoring stack	Optional Prometheus/Grafana/Jaeger.	`make monitor-up`
Kubernetes	Use `k8s/jan-server` Helm chart. Values mirror `pkg/config` defaults.	`helm install jan ./k8s/jan-server -f values.yaml`

7. Change Impact Checklist

When modifying the system architecture:

Update the relevant service README and API docs.
Reflect new ports/paths in Kong configuration.
Adjust docs/architecture/services.md and docs/architecture/data-flow.md.
Regenerate configuration artifacts (make config-generate) if pkg/config changes.
Update Kubernetes values and Helm defaults as needed.

Maintainer: Jan Server Architecture Group - Last Reviewed: November 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Design

1. System Overview

2. Architecture Layers

3. Component Diagram

4. Request Lifecycles

Chat Completions

Response Orchestration

Media Handling

MCP JSON-RPC

5. Data & Network Topology

6. Deployment Modes

7. Change Impact Checklist

FilesExpand file tree

system-design.md

Latest commit

History

system-design.md

File metadata and controls

System Design

1. System Overview

2. Architecture Layers

3. Component Diagram

4. Request Lifecycles

Chat Completions

Response Orchestration

Media Handling

MCP JSON-RPC

5. Data & Network Topology

6. Deployment Modes

7. Change Impact Checklist