A comprehensive self-hosted AI server platform that provides OpenAI-compatible APIs, multi-tenant organization management, and AI model inference capabilities. Jan Server enables organizations to deploy their own private AI infrastructure with full control over data, models, and access.
Jan Server is a Kubernetes-native platform consisting of multiple microservices that work together to provide a complete AI infrastructure solution. It offers:
- OpenAI-Compatible API: Full compatibility with OpenAI's chat completion API
- Multi-Tenant Architecture: Organization and project-based access control
- AI Model Inference: Scalable model serving with health monitoring
- Database Management: PostgreSQL with read/write replicas
- Authentication & Authorization: JWT + Google OAuth2 integration
- API Key Management: Secure API key generation and management
- Model Context Protocol (MCP): Support for external tools and resources
- Web Search Integration: Serper API integration for web search capabilities
- Monitoring & Profiling: Built-in performance monitoring and health checks
The core API service that provides OpenAI-compatible endpoints and manages all client interactions.
Key Features:
- OpenAI-compatible chat completion API with streaming support
- Multi-tenant organization and project management
- JWT-based authentication with Google OAuth2 integration
- API key management at organization and project levels
- Model Context Protocol (MCP) support for external tools
- Web search integration via Serper API
- Comprehensive monitoring and profiling capabilities
- Database transaction management with automatic rollback
Technology Stack:
- Go 1.24.6 with Gin web framework
- PostgreSQL with GORM and read/write replicas
- JWT authentication and Google OAuth2
- Swagger/OpenAPI documentation
- Built-in pprof profiling with Grafana Pyroscope integration
The persistent data storage layer with enterprise-grade features.
Key Features:
- Read/write replica support for high availability
- Automatic schema migrations with Atlas
- Connection pooling and optimization
- Transaction management with rollback support
Before setting up Jan Server, ensure you have the following components installed:
β οΈ Important: Windows and macOS users can only run mock servers for development. Real LLM model inference with vLLM is only supported on Linux systems with NVIDIA GPUs.
-
Docker Desktop
- Windows: Download from Docker Desktop for Windows
- macOS: Download from Docker Desktop for Mac
- Linux: Follow Docker Engine installation guide
-
Minikube
- Windows:
choco install minikube
or download from minikube releases - macOS:
brew install minikube
or download from minikube releases - Linux:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
- Windows:
-
Helm
- Windows:
choco install kubernetes-helm
or download from Helm releases - macOS:
brew install helm
or download from Helm releases - Linux:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
- Windows:
-
kubectl
- Windows:
choco install kubernetes-cli
or download from kubectl releases - macOS:
brew install kubectl
or download from kubectl releases - Linux:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && sudo install kubectl /usr/local/bin/kubectl
- Windows:
If you plan to run real LLM models (not mock servers) and have an NVIDIA GPU:
-
Install NVIDIA Container Toolkit: Follow the official NVIDIA Container Toolkit installation guide
-
Configure Minikube for GPU support: Follow the official minikube GPU tutorial for complete setup instructions.
-
Start Minikube and configure Docker:
minikube start eval $(minikube docker-env)
-
Build and deploy all services:
./scripts/run.sh
-
Access the services:
- API Gateway: http://localhost:8080
- Swagger UI: http://localhost:8080/api/swagger/index.html
- Health Check: http://localhost:8080/healthcheck
- Version Info: http://localhost:8080/v1/version
-
Start Minikube with GPU support:
minikube start --gpus all eval $(minikube docker-env)
-
Configure GPU memory utilization (if you have limited GPU memory):
GPU memory utilization is configured in the vLLM Dockerfile. See the vLLM CLI documentation for all available arguments.
To modify GPU memory utilization, edit the vLLM launch command in:
apps/jan-inference-model/Dockerfile
(for Docker builds)- Helm chart values (for Kubernetes deployment)
-
Build and deploy all services:
# For GPU setup, modify run.sh to use GPU-enabled minikube # Edit scripts/run.sh and change "minikube start" to "minikube start --gpus all" ./scripts/run.sh
For production deployments, modify the Helm values in charts/jan-server/values.yaml
and deploy using:
helm install jan-server ./charts/jan-server
The system is configured through environment variables defined in the Helm values file. Key configuration areas include:
- Database Connection: PostgreSQL connection strings for read/write replicas
- Authentication: JWT secrets and Google OAuth2 credentials
- API Keys: Encryption secrets for API key management
- External Services: Serper API key for web search functionality
- Model Integration: Jan Inference Model service URL
- JWT_SECRET: HMAC-SHA-256 secret for JWT token signing
- APIKEY_SECRET: HMAC-SHA-256 secret for API key encryption
- Database Credentials: PostgreSQL username, password, and database name
- SERPER_API_KEY: API key for web search functionality
- Google OAuth2: Client ID, secret, and redirect URL for authentication
- Model Service: URL for Jan Inference Model service communication
The system uses Helm charts for deployment configuration:
- Values Files: Configuration files for different environments
jan-server/
βββ apps/ # Application services
β βββ jan-api-gateway/ # Main API gateway service
β β βββ application/ # Go application code
β β βββ docker/ # Docker configuration
β β βββ README.md # Service-specific documentation
β βββ jan-inference-model/ # AI model inference service
β βββ application/ # Python application code
β βββ Dockerfile # Container configuration
βββ charts/ # Helm charts
β βββ jan-server/ # Main deployment chart
βββ scripts/ # Deployment and utility scripts
βββ README.md # This file
# Build API Gateway
docker build -t jan-api-gateway:latest ./apps/jan-api-gateway
# Build Inference Model
docker build -t jan-inference-model:latest ./apps/jan-inference-model
The system uses Atlas for database migrations:
# Generate migration files
go run ./apps/jan-api-gateway/application/cmd/codegen/dbmigration
# Apply migrations
atlas migrate apply --url "your-database-url"
- Health Check Endpoints: Available on all services
- Model Health Monitoring: Automated health checks for inference models
- Database Health: Connection monitoring and replica status
- pprof Endpoints: Available on port 6060 for performance analysis
- Grafana Pyroscope: Continuous profiling integration
- Request Tracing: Unique request IDs for end-to-end tracing
- Structured Logging: JSON-formatted logs across all services
- Request/Response Logging: Complete request lifecycle tracking
- Error Tracking: Unique error codes for debugging
- JWT Tokens: Secure token-based authentication
- Google OAuth2: Social authentication integration
- API Key Management: Scoped API keys for different access levels
- Multi-tenant Security: Organization and project-level access control
- Encrypted API Keys: HMAC-SHA-256 encryption for sensitive data
- Secure Database Connections: SSL-enabled database connections
- Environment Variable Security: Secure handling of sensitive configuration
# Start local cluster
minikube start
eval $(minikube docker-env)
# Deploy services
./scripts/run.sh
# Access services
kubectl port-forward svc/jan-server-jan-api-gateway 8080:8080
# Update Helm dependencies
helm dependency update ./charts/jan-server
# Deploy to production
helm install jan-server ./charts/jan-server
# Upgrade deployment
helm upgrade jan-server ./charts/jan-server
# Uninstall
helm uninstall jan-server
Symptoms: The jan-server-jan-inference-model
pod stays in Pending
status.
Diagnosis Steps:
# Check pod status
kubectl get pods
# Get detailed pod information (replace with your actual pod name)
kubectl describe pod jan-server-jan-inference-model-<POD_ID>
Common Error Messages and Solutions:
0/1 nodes are available: 1 Insufficient nvidia.com/gpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Solution for Real LLM Setup:
- Ensure you have NVIDIA GPU and drivers installed
- Install NVIDIA Container Toolkit (see Prerequisites section)
- Start minikube with GPU support:
minikube start --gpus all
# Check pod logs to see the actual error
kubectl logs jan-server-jan-inference-model-<POD_ID>
Common vLLM startup issues:
- CUDA Out of Memory: Modify vLLM arguments in Dockerfile to reduce memory usage
- Model Loading Errors: Check if model path is correct and accessible
- GPU Not Detected: Ensure NVIDIA Container Toolkit is properly installed
Symptoms: Helm commands fail or charts won't install.
Solutions:
# Update Helm dependencies
helm dependency update ./charts/jan-server
# Check Helm status
helm list
# Uninstall and reinstall
helm uninstall jan-server
helm install jan-server ./charts/jan-server
- Swagger UI: Available at
/api/swagger/index.html
when running - OpenAPI Specification: Auto-generated from code annotations
- Interactive Testing: Built-in API testing interface
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request