Jan Server

A comprehensive self-hosted AI server platform that provides OpenAI-compatible APIs, multi-tenant organization management, and AI model inference capabilities. Jan Server enables organizations to deploy their own private AI infrastructure with full control over data, models, and access.

🚀 Overview

Jan Server is a Kubernetes-native platform consisting of multiple microservices that work together to provide a complete AI infrastructure solution. It offers:

OpenAI-Compatible API: Full compatibility with OpenAI's chat completion API
Multi-Tenant Architecture: Organization and project-based access control
AI Model Inference: Scalable model serving with health monitoring
Database Management: PostgreSQL with read/write replicas
Authentication & Authorization: JWT + Google OAuth2 integration
API Key Management: Secure API key generation and management
Model Context Protocol (MCP): Support for external tools and resources
Web Search Integration: Serper API integration for web search capabilities
Monitoring & Profiling: Built-in performance monitoring and health checks

🏗️ System Architecture

📦 Services

Jan API Gateway

The core API service that provides OpenAI-compatible endpoints and manages all client interactions.

Key Features:

OpenAI-compatible chat completion API with streaming support
Multi-tenant organization and project management
JWT-based authentication with Google OAuth2 integration
API key management at organization and project levels
Model Context Protocol (MCP) support for external tools
Web search integration via Serper API
Comprehensive monitoring and profiling capabilities
Database transaction management with automatic rollback

Technology Stack:

Go 1.24.6 with Gin web framework
PostgreSQL with GORM and read/write replicas
JWT authentication and Google OAuth2
Swagger/OpenAPI documentation
Built-in pprof profiling with Grafana Pyroscope integration

PostgreSQL Database

The persistent data storage layer with enterprise-grade features.

Key Features:

Read/write replica support for high availability
Automatic schema migrations with Atlas
Connection pooling and optimization
Transaction management with rollback support

🚀 Quick Start

Prerequisites

Before setting up Jan Server, ensure you have the following components installed:

Required Components

⚠️ Important: Windows and macOS users can only run mock servers for development. Real LLM model inference with vLLM is only supported on Linux systems with NVIDIA GPUs.

Docker Desktop
- Windows: Download from Docker Desktop for Windows
- macOS: Download from Docker Desktop for Mac
- Linux: Follow Docker Engine installation guide
Minikube
- Windows: choco install minikube or download from minikube releases
- macOS: brew install minikube or download from minikube releases
- Linux: curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
Helm
- Windows: choco install kubernetes-helm or download from Helm releases
- macOS: brew install helm or download from Helm releases
- Linux: curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
kubectl
- Windows: choco install kubernetes-cli or download from kubectl releases
- macOS: brew install kubectl or download from kubectl releases
- Linux: curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && sudo install kubectl /usr/local/bin/kubectl

Optional: NVIDIA GPU Support (for Real LLM Models)

If you plan to run real LLM models (not mock servers) and have an NVIDIA GPU:

Install NVIDIA Container Toolkit: Follow the official NVIDIA Container Toolkit installation guide
Configure Minikube for GPU support: Follow the official minikube GPU tutorial for complete setup instructions.

Local Development Setup

Option 1: Mock Server Setup (Recommended for Development)

Start Minikube and configure Docker:

minikube start
eval $(minikube docker-env)

Build and deploy all services:
```
./scripts/run.sh
```
Access the services:
- API Gateway: http://localhost:8080
- Swagger UI: http://localhost:8080/api/swagger/index.html
- Health Check: http://localhost:8080/healthcheck
- Version Info: http://localhost:8080/v1/version

Option 2: Real LLM Setup (Requires NVIDIA GPU)

Start Minikube with GPU support:

minikube start --gpus all
eval $(minikube docker-env)

Configure GPU memory utilization (if you have limited GPU memory):

GPU memory utilization is configured in the vLLM Dockerfile. See the vLLM CLI documentation for all available arguments.

To modify GPU memory utilization, edit the vLLM launch command in:
- apps/jan-inference-model/Dockerfile (for Docker builds)
- Helm chart values (for Kubernetes deployment)

Build and deploy all services:

# For GPU setup, modify run.sh to use GPU-enabled minikube
# Edit scripts/run.sh and change "minikube start" to "minikube start --gpus all"
./scripts/run.sh

Production Deployment

For production deployments, modify the Helm values in charts/jan-server/values.yaml and deploy using:

helm install jan-server ./charts/jan-server

⚙️ Configuration

Environment Variables

The system is configured through environment variables defined in the Helm values file. Key configuration areas include:

Jan API Gateway Configuration

Database Connection: PostgreSQL connection strings for read/write replicas
Authentication: JWT secrets and Google OAuth2 credentials
API Keys: Encryption secrets for API key management
External Services: Serper API key for web search functionality
Model Integration: Jan Inference Model service URL

Security Configuration

JWT_SECRET: HMAC-SHA-256 secret for JWT token signing
APIKEY_SECRET: HMAC-SHA-256 secret for API key encryption
Database Credentials: PostgreSQL username, password, and database name

External Service Integration

SERPER_API_KEY: API key for web search functionality
Google OAuth2: Client ID, secret, and redirect URL for authentication
Model Service: URL for Jan Inference Model service communication

Helm Configuration

The system uses Helm charts for deployment configuration:

Values Files: Configuration files for different environments

🔧 Development

Project Structure

jan-server/
├── apps/                          # Application services
│   ├── jan-api-gateway/           # Main API gateway service
│   │   ├── application/           # Go application code
│   │   ├── docker/               # Docker configuration
│   │   └── README.md            # Service-specific documentation
│   └── jan-inference-model/       # AI model inference service
│       ├── application/           # Python application code
│       └── Dockerfile           # Container configuration
├── charts/                        # Helm charts
│   └── jan-server/           # Main deployment chart
├── scripts/                      # Deployment and utility scripts
└── README.md                     # This file

Building Services

# Build API Gateway
docker build -t jan-api-gateway:latest ./apps/jan-api-gateway

# Build Inference Model
docker build -t jan-inference-model:latest ./apps/jan-inference-model

Database Migrations

The system uses Atlas for database migrations:

# Generate migration files
go run ./apps/jan-api-gateway/application/cmd/codegen/dbmigration

# Apply migrations
atlas migrate apply --url "your-database-url"

📊 Monitoring & Observability

Health Monitoring

Health Check Endpoints: Available on all services
Model Health Monitoring: Automated health checks for inference models
Database Health: Connection monitoring and replica status

Performance Profiling

pprof Endpoints: Available on port 6060 for performance analysis
Grafana Pyroscope: Continuous profiling integration
Request Tracing: Unique request IDs for end-to-end tracing

Logging

Structured Logging: JSON-formatted logs across all services
Request/Response Logging: Complete request lifecycle tracking
Error Tracking: Unique error codes for debugging

🔒 Security

Authentication & Authorization

JWT Tokens: Secure token-based authentication
Google OAuth2: Social authentication integration
API Key Management: Scoped API keys for different access levels
Multi-tenant Security: Organization and project-level access control

Data Protection

Encrypted API Keys: HMAC-SHA-256 encryption for sensitive data
Secure Database Connections: SSL-enabled database connections
Environment Variable Security: Secure handling of sensitive configuration

🚀 Deployment

Local Development

# Start local cluster
minikube start
eval $(minikube docker-env)

# Deploy services
./scripts/run.sh

# Access services
kubectl port-forward svc/jan-server-jan-api-gateway 8080:8080

Production Deployment

# Update Helm dependencies
helm dependency update ./charts/jan-server

# Deploy to production
helm install jan-server ./charts/jan-server

# Upgrade deployment
helm upgrade jan-server ./charts/jan-server

# Uninstall
helm uninstall jan-server

🐛 Troubleshooting

Common Issues and Solutions

1. LLM Pod Not Starting (Pending Status)

Symptoms: The jan-server-jan-inference-model pod stays in Pending status.

Diagnosis Steps:

# Check pod status
kubectl get pods

# Get detailed pod information (replace with your actual pod name)
kubectl describe pod jan-server-jan-inference-model-<POD_ID>

Common Error Messages and Solutions:

Error: "Insufficient nvidia.com/gpu"

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

Solution for Real LLM Setup:

Ensure you have NVIDIA GPU and drivers installed
Install NVIDIA Container Toolkit (see Prerequisites section)
Start minikube with GPU support:
```
minikube start --gpus all
```

Error: vLLM Pod Keeps Restarting

# Check pod logs to see the actual error
kubectl logs jan-server-jan-inference-model-<POD_ID>

Common vLLM startup issues:

CUDA Out of Memory: Modify vLLM arguments in Dockerfile to reduce memory usage
Model Loading Errors: Check if model path is correct and accessible
GPU Not Detected: Ensure NVIDIA Container Toolkit is properly installed

2. Helm Issues

Symptoms: Helm commands fail or charts won't install.

Solutions:

# Update Helm dependencies
helm dependency update ./charts/jan-server

# Check Helm status
helm list

# Uninstall and reinstall
helm uninstall jan-server
helm install jan-server ./charts/jan-server

📚 API Documentation

Swagger UI: Available at /api/swagger/index.html when running
OpenAPI Specification: Auto-generated from code annotations
Interactive Testing: Built-in API testing interface

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
.github		.github
apps/jan-api-gateway		apps/jan-api-gateway
charts/jan-server		charts/jan-server
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
ReleaseProcedure.md		ReleaseProcedure.md

menloresearch/jan-server

Folders and files

Latest commit

History

Repository files navigation

Jan Server

🚀 Overview

🏗️ System Architecture

📦 Services

Jan API Gateway

PostgreSQL Database

🚀 Quick Start

Prerequisites

Required Components

Optional: NVIDIA GPU Support (for Real LLM Models)

Local Development Setup

Option 1: Mock Server Setup (Recommended for Development)

Option 2: Real LLM Setup (Requires NVIDIA GPU)

Production Deployment

⚙️ Configuration

Environment Variables

Jan API Gateway Configuration

Security Configuration

External Service Integration

Helm Configuration

🔧 Development

Project Structure

Building Services

Database Migrations

📊 Monitoring & Observability

Health Monitoring

Performance Profiling

Logging

🔒 Security

Authentication & Authorization

Data Protection

🚀 Deployment

Local Development

Production Deployment

🐛 Troubleshooting

Common Issues and Solutions

1. LLM Pod Not Starting (Pending Status)

Error: "Insufficient nvidia.com/gpu"

Error: vLLM Pod Keeps Restarting

2. Helm Issues

📚 API Documentation

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages