Skip to content

menloresearch/jan-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Jan Server

A comprehensive self-hosted AI server platform that provides OpenAI-compatible APIs, multi-tenant organization management, and AI model inference capabilities. Jan Server enables organizations to deploy their own private AI infrastructure with full control over data, models, and access.

πŸš€ Overview

Jan Server is a Kubernetes-native platform consisting of multiple microservices that work together to provide a complete AI infrastructure solution. It offers:

  • OpenAI-Compatible API: Full compatibility with OpenAI's chat completion API
  • Multi-Tenant Architecture: Organization and project-based access control
  • AI Model Inference: Scalable model serving with health monitoring
  • Database Management: PostgreSQL with read/write replicas
  • Authentication & Authorization: JWT + Google OAuth2 integration
  • API Key Management: Secure API key generation and management
  • Model Context Protocol (MCP): Support for external tools and resources
  • Web Search Integration: Serper API integration for web search capabilities
  • Monitoring & Profiling: Built-in performance monitoring and health checks

πŸ—οΈ System Architecture

System Architecture Diagram

πŸ“¦ Services

Jan API Gateway

The core API service that provides OpenAI-compatible endpoints and manages all client interactions.

Key Features:

  • OpenAI-compatible chat completion API with streaming support
  • Multi-tenant organization and project management
  • JWT-based authentication with Google OAuth2 integration
  • API key management at organization and project levels
  • Model Context Protocol (MCP) support for external tools
  • Web search integration via Serper API
  • Comprehensive monitoring and profiling capabilities
  • Database transaction management with automatic rollback

Technology Stack:

  • Go 1.24.6 with Gin web framework
  • PostgreSQL with GORM and read/write replicas
  • JWT authentication and Google OAuth2
  • Swagger/OpenAPI documentation
  • Built-in pprof profiling with Grafana Pyroscope integration

PostgreSQL Database

The persistent data storage layer with enterprise-grade features.

Key Features:

  • Read/write replica support for high availability
  • Automatic schema migrations with Atlas
  • Connection pooling and optimization
  • Transaction management with rollback support

πŸš€ Quick Start

Prerequisites

Before setting up Jan Server, ensure you have the following components installed:

Required Components

⚠️ Important: Windows and macOS users can only run mock servers for development. Real LLM model inference with vLLM is only supported on Linux systems with NVIDIA GPUs.

  1. Docker Desktop

  2. Minikube

    • Windows: choco install minikube or download from minikube releases
    • macOS: brew install minikube or download from minikube releases
    • Linux: curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
  3. Helm

    • Windows: choco install kubernetes-helm or download from Helm releases
    • macOS: brew install helm or download from Helm releases
    • Linux: curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
  4. kubectl

    • Windows: choco install kubernetes-cli or download from kubectl releases
    • macOS: brew install kubectl or download from kubectl releases
    • Linux: curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && sudo install kubectl /usr/local/bin/kubectl

Optional: NVIDIA GPU Support (for Real LLM Models)

If you plan to run real LLM models (not mock servers) and have an NVIDIA GPU:

  1. Install NVIDIA Container Toolkit: Follow the official NVIDIA Container Toolkit installation guide

  2. Configure Minikube for GPU support: Follow the official minikube GPU tutorial for complete setup instructions.

Local Development Setup

Option 1: Mock Server Setup (Recommended for Development)

  1. Start Minikube and configure Docker:

    minikube start
    eval $(minikube docker-env)
  2. Build and deploy all services:

    ./scripts/run.sh
  3. Access the services:

Option 2: Real LLM Setup (Requires NVIDIA GPU)

  1. Start Minikube with GPU support:

    minikube start --gpus all
    eval $(minikube docker-env)
  2. Configure GPU memory utilization (if you have limited GPU memory):

    GPU memory utilization is configured in the vLLM Dockerfile. See the vLLM CLI documentation for all available arguments.

    To modify GPU memory utilization, edit the vLLM launch command in:

    • apps/jan-inference-model/Dockerfile (for Docker builds)
    • Helm chart values (for Kubernetes deployment)
  3. Build and deploy all services:

    # For GPU setup, modify run.sh to use GPU-enabled minikube
    # Edit scripts/run.sh and change "minikube start" to "minikube start --gpus all"
    ./scripts/run.sh

Production Deployment

For production deployments, modify the Helm values in charts/jan-server/values.yaml and deploy using:

helm install jan-server ./charts/jan-server

βš™οΈ Configuration

Environment Variables

The system is configured through environment variables defined in the Helm values file. Key configuration areas include:

Jan API Gateway Configuration

  • Database Connection: PostgreSQL connection strings for read/write replicas
  • Authentication: JWT secrets and Google OAuth2 credentials
  • API Keys: Encryption secrets for API key management
  • External Services: Serper API key for web search functionality
  • Model Integration: Jan Inference Model service URL

Security Configuration

  • JWT_SECRET: HMAC-SHA-256 secret for JWT token signing
  • APIKEY_SECRET: HMAC-SHA-256 secret for API key encryption
  • Database Credentials: PostgreSQL username, password, and database name

External Service Integration

  • SERPER_API_KEY: API key for web search functionality
  • Google OAuth2: Client ID, secret, and redirect URL for authentication
  • Model Service: URL for Jan Inference Model service communication

Helm Configuration

The system uses Helm charts for deployment configuration:

  • Values Files: Configuration files for different environments

πŸ”§ Development

Project Structure

jan-server/
β”œβ”€β”€ apps/                          # Application services
β”‚   β”œβ”€β”€ jan-api-gateway/           # Main API gateway service
β”‚   β”‚   β”œβ”€β”€ application/           # Go application code
β”‚   β”‚   β”œβ”€β”€ docker/               # Docker configuration
β”‚   β”‚   └── README.md            # Service-specific documentation
β”‚   └── jan-inference-model/       # AI model inference service
β”‚       β”œβ”€β”€ application/           # Python application code
β”‚       └── Dockerfile           # Container configuration
β”œβ”€β”€ charts/                        # Helm charts
β”‚   └── jan-server/           # Main deployment chart
β”œβ”€β”€ scripts/                      # Deployment and utility scripts
└── README.md                     # This file

Building Services

# Build API Gateway
docker build -t jan-api-gateway:latest ./apps/jan-api-gateway

# Build Inference Model
docker build -t jan-inference-model:latest ./apps/jan-inference-model

Database Migrations

The system uses Atlas for database migrations:

# Generate migration files
go run ./apps/jan-api-gateway/application/cmd/codegen/dbmigration

# Apply migrations
atlas migrate apply --url "your-database-url"

πŸ“Š Monitoring & Observability

Health Monitoring

  • Health Check Endpoints: Available on all services
  • Model Health Monitoring: Automated health checks for inference models
  • Database Health: Connection monitoring and replica status

Performance Profiling

  • pprof Endpoints: Available on port 6060 for performance analysis
  • Grafana Pyroscope: Continuous profiling integration
  • Request Tracing: Unique request IDs for end-to-end tracing

Logging

  • Structured Logging: JSON-formatted logs across all services
  • Request/Response Logging: Complete request lifecycle tracking
  • Error Tracking: Unique error codes for debugging

πŸ”’ Security

Authentication & Authorization

  • JWT Tokens: Secure token-based authentication
  • Google OAuth2: Social authentication integration
  • API Key Management: Scoped API keys for different access levels
  • Multi-tenant Security: Organization and project-level access control

Data Protection

  • Encrypted API Keys: HMAC-SHA-256 encryption for sensitive data
  • Secure Database Connections: SSL-enabled database connections
  • Environment Variable Security: Secure handling of sensitive configuration

πŸš€ Deployment

Local Development

# Start local cluster
minikube start
eval $(minikube docker-env)

# Deploy services
./scripts/run.sh

# Access services
kubectl port-forward svc/jan-server-jan-api-gateway 8080:8080

Production Deployment

# Update Helm dependencies
helm dependency update ./charts/jan-server

# Deploy to production
helm install jan-server ./charts/jan-server

# Upgrade deployment
helm upgrade jan-server ./charts/jan-server

# Uninstall
helm uninstall jan-server

πŸ› Troubleshooting

Common Issues and Solutions

1. LLM Pod Not Starting (Pending Status)

Symptoms: The jan-server-jan-inference-model pod stays in Pending status.

Diagnosis Steps:

# Check pod status
kubectl get pods

# Get detailed pod information (replace with your actual pod name)
kubectl describe pod jan-server-jan-inference-model-<POD_ID>

Common Error Messages and Solutions:

Error: "Insufficient nvidia.com/gpu"
0/1 nodes are available: 1 Insufficient nvidia.com/gpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

Solution for Real LLM Setup:

  1. Ensure you have NVIDIA GPU and drivers installed
  2. Install NVIDIA Container Toolkit (see Prerequisites section)
  3. Start minikube with GPU support:
    minikube start --gpus all
Error: vLLM Pod Keeps Restarting
# Check pod logs to see the actual error
kubectl logs jan-server-jan-inference-model-<POD_ID>

Common vLLM startup issues:

  1. CUDA Out of Memory: Modify vLLM arguments in Dockerfile to reduce memory usage
  2. Model Loading Errors: Check if model path is correct and accessible
  3. GPU Not Detected: Ensure NVIDIA Container Toolkit is properly installed

2. Helm Issues

Symptoms: Helm commands fail or charts won't install.

Solutions:

# Update Helm dependencies
helm dependency update ./charts/jan-server

# Check Helm status
helm list

# Uninstall and reinstall
helm uninstall jan-server
helm install jan-server ./charts/jan-server

πŸ“š API Documentation

  • Swagger UI: Available at /api/swagger/index.html when running
  • OpenAPI Specification: Auto-generated from code annotations
  • Interactive Testing: Built-in API testing interface

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 7