Scaling to 500-1000 Requests Per Second

Current Setup Analysis

Current Capacity

10 Django containers (web-1 through web-10)
4 Gunicorn workers × 2 threads per container = 8 concurrent requests per container
Total: 10 × 8 = 80 concurrent requests system-wide

Theoretical RPS Calculation

Concurrent requests × (1 / Average response time) = RPS
80 × (1 / 0.100s) = 800 RPS (if all responses are 100ms)
80 × (1 / 0.500s) = 160 RPS (if responses are 500ms)

Problem: With streaming endpoints (multi-second responses), effective capacity is much lower (~100-200 RPS).

Target: 500-1000 RPS

Required Changes Summary

Component	Current	Target	Change
Django Containers	10	15-20	+50-100%
Gunicorn Workers	4 per container	8 per container	+100%
Gunicorn Threads	2 per worker	4 per worker	+100%
Container CPU	1.0 CPU	2.0 CPU	+100%
Container RAM	1GB	2GB	+100%
PgBouncer Pool	25	50	+100%
Redis Memory	2GB	4-8GB	+100-300%
Nginx Workers	Default (auto)	8-16	Manual

Expected Capacity After Changes

20 containers × 8 workers × 4 threads = 640 concurrent requests
640 × (1 / 0.100s) = 6,400 RPS (best case)
640 × (1 / 0.500s) = 1,280 RPS (with slower responses)

This provides 2-5x headroom above your target.

Detailed Changes

1. Docker Compose - Increase Containers and Resources

File: docker-compose.loadbalanced.yml

Change 1.1: Update Gunicorn Command (All web containers)

FROM:

command: gunicorn --bind 0.0.0.0:8000 --workers 4 --worker-class gthread --threads 2 --timeout 300 --access-logfile - --error-logfile - arena_backend.wsgi

TO:

command: gunicorn --bind 0.0.0.0:8000 --workers 8 --worker-class gthread --threads 4 --max-requests 1000 --max-requests-jitter 100 --timeout 600 --keep-alive 5 --access-logfile - --error-logfile - arena_backend.wsgi

Changes explained:

--workers 8: Doubles worker count (4 → 8)
--threads 4: Doubles threads per worker (2 → 4)
--max-requests 1000: Restart workers after 1000 requests (prevents memory leaks)
--max-requests-jitter 100: Add randomness to prevent all workers restarting simultaneously
--timeout 600: Increase timeout for streaming responses (300s → 600s)
--keep-alive 5: Keep connections alive for 5 seconds

Change 1.2: Increase Container Resources

FROM:

deploy:
  resources:
    limits:
      cpus: '1.0'
      memory: 1024M
    reservations:
      cpus: '0.5'
      memory: 512M

TO:

deploy:
  resources:
    limits:
      cpus: '2.0'
      memory: 2048M
    reservations:
      cpus: '1.0'
      memory: 1024M

Change 1.3: Add 5 More Containers (web-11 through web-15)

Add these after web-10:

  web-11:
    build: ./backend
    container_name: arena-web-11
    command: gunicorn --bind 0.0.0.0:8000 --workers 8 --worker-class gthread --threads 4 --max-requests 1000 --max-requests-jitter 100 --timeout 600 --keep-alive 5 --access-logfile - --error-logfile - arena_backend.wsgi
    env_file:
      - ./config.env
    volumes:
      - ./backend/:/usr/src/backend/
      - static_volume:/usr/src/backend/static
      - logs_vol:/logs
    expose:
      - 8000
    environment:
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - CONTAINER_NAME=web-11
    depends_on:
      - redis
      - pgbouncer
    restart: unless-stopped
    networks:
      - arena-network
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2048M
        reservations:
          cpus: '1.0'
          memory: 1024M
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  web-12:
    # ... same as web-11, change container_name and CONTAINER_NAME

  web-13:
    # ... same as web-11, change container_name and CONTAINER_NAME

  web-14:
    # ... same as web-11, change container_name and CONTAINER_NAME

  web-15:
    # ... same as web-11, change container_name and CONTAINER_NAME

Total after this change: 15 containers × 8 workers × 4 threads = 480 concurrent requests

2. Nginx Load Balancer Configuration

File: nginx/load-balancer.conf

Change 2.1: Add New Upstream Servers

Add web-11 through web-15 to all upstream blocks:

upstream django_backend {
    keepalive 128;  # Increased from 64
    keepalive_timeout 60s;
    keepalive_requests 1000;  # Increased from 100

    # Existing servers
    server web-1:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-2:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-3:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-4:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-5:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-6:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-7:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-8:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-9:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-10:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;

    # NEW: Add 5 more servers
    server web-11:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-12:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-13:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-14:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
    server web-15:8000 max_fails=3 fail_timeout=30s max_conns=300 weight=1;
}

upstream django_streaming {
    least_conn;
    keepalive 64;  # Increased from 32
    keepalive_timeout 120s;

    # Add web-11 through web-15 here too
    server web-11:8000 max_fails=2 fail_timeout=60s max_conns=150 weight=1;
    server web-12:8000 max_fails=2 fail_timeout=60s max_conns=150 weight=1;
    server web-13:8000 max_fails=2 fail_timeout=60s max_conns=150 weight=1;
    server web-14:8000 max_fails=2 fail_timeout=60s max_conns=150 weight=1;
    server web-15:8000 max_fails=2 fail_timeout=60s max_conns=150 weight=1;
}

upstream django_websocket {
    ip_hash;
    keepalive 64;  # Increased from 32
    keepalive_timeout 300s;

    # Add web-11 through web-15 here too
    server web-11:8000 max_fails=2 fail_timeout=60s max_conns=200 weight=1;
    server web-12:8000 max_fails=2 fail_timeout=60s max_conns=200 weight=1;
    server web-13:8000 max_fails=2 fail_timeout=60s max_conns=200 weight=1;
    server web-14:8000 max_fails=2 fail_timeout=60s max_conns=200 weight=1;
    server web-15:8000 max_fails=2 fail_timeout=60s max_conns=200 weight=1;
}

Change 2.2: Optimize Rate Limits for Production High Load

# Rate limiting zones (already done, but verify these values)
limit_req_zone $binary_remote_addr zone=general_limit:10m rate=1000r/s;
limit_req_zone $binary_remote_addr zone=streaming_limit:10m rate=200r/s;
limit_req_zone $binary_remote_addr zone=auth_limit:10m rate=100r/s;

3. Nginx Main Configuration

File: Create/Edit nginx/nginx.conf or update Dockerfile to set these

Add at the top of nginx configuration:

# Number of worker processes (1 per CPU core, or 2x CPU cores for I/O bound)
worker_processes 16;

# Maximum open file descriptors per worker
worker_rlimit_nofile 65535;

events {
    # Maximum concurrent connections per worker
    worker_connections 10000;

    # Use efficient connection processing (Linux)
    use epoll;

    # Accept multiple connections at once
    multi_accept on;
}

http {
    # ... existing http config ...

    # Connection and request settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;

    # Buffer sizes
    client_body_buffer_size 128k;
    client_max_body_size 100M;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;

    # Timeouts
    client_body_timeout 60s;
    client_header_timeout 60s;
    send_timeout 60s;

    # ... include other configs ...
}

4. PgBouncer Configuration

File: pgbouncer/pgbouncer.ini

Change 4.1: Increase Connection Limits

[pgbouncer]
pool_mode = transaction

# Connection limits - DOUBLED for high load
max_client_conn = 2000          # Was: 1000
default_pool_size = 50          # Was: 25
reserve_pool_size = 10          # Was: 5
max_db_connections = 100        # Was: 50
max_user_connections = 100      # Was: 50

# Timeouts
server_idle_timeout = 600
server_lifetime = 3600
server_connect_timeout = 15
query_timeout = 60              # Increased from 30
query_wait_timeout = 120
client_idle_timeout = 0

# Performance
so_reuseport = 1

Change 4.2: Increase PgBouncer Resources

In docker-compose.loadbalanced.yml:

pgbouncer:
  deploy:
    resources:
      limits:
        cpus: '1.0'        # Was: 0.5
        memory: 512M       # Was: 256M
      reservations:
        cpus: '0.5'        # Was: 0.25
        memory: 256M       # Was: 128M

5. Redis Configuration

File: docker-compose.loadbalanced.yml

Change 5.1: Increase Redis Memory and Resources

redis:
  container_name: redis
  image: "redis:7-alpine"
  command: redis-server --maxmemory 8gb --maxmemory-policy allkeys-lru --maxclients 10000 --tcp-backlog 511 --timeout 0 --tcp-keepalive 300
  ports:
    - 6379:6379
  volumes:
    - redis_data:/data
  restart: unless-stopped
  networks:
    - arena-network
  deploy:
    resources:
      limits:
        cpus: '2.0'
        memory: 8192M
      reservations:
        cpus: '1.0'
        memory: 4096M
  healthcheck:
    test: ["CMD", "redis-cli", "ping"]
    interval: 10s
    timeout: 3s
    retries: 3

Changes:

Memory: 2GB → 8GB
Max clients: default (10000) → explicit 10000
TCP backlog: 511 (handles more incoming connections)
Resources: 2 CPU, 8GB RAM

6. Django Settings Optimization

File: backend/arena_backend/settings.py

Change 6.1: Database Connection Pool Settings

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": os.getenv("DB_NAME"),
        "USER": os.getenv("DB_USER"),
        "PASSWORD": os.getenv("DB_PASSWORD"),
        "HOST": os.getenv("DB_HOST", "pgbouncer"),  # Ensure PgBouncer
        "PORT": os.getenv("DB_PORT", "6432"),

        # Connection pooling settings
        "CONN_MAX_AGE": 300,  # Reduced from 600 (5 minutes instead of 10)
        "CONN_HEALTH_CHECKS": True,

        "OPTIONS": {
            "connect_timeout": 10,
            "options": "-c statement_timeout=30000",  # 30 second query timeout
            "keepalives": 1,
            "keepalives_idle": 30,
            "keepalives_interval": 10,
            "keepalives_count": 5,
        },
    }
}

Change 6.2: Redis Connection Pool

# Redis connection pool settings
REDIS_CONNECTION_POOL_KWARGS = {
    "max_connections": 100,  # Per Django worker
    "retry_on_timeout": True,
    "socket_keepalive": True,
    "socket_keepalive_options": {
        socket.TCP_KEEPIDLE: 1,
        socket.TCP_KEEPINTVL: 1,
        socket.TCP_KEEPCNT: 5,
    },
}

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": f"redis://{REDIS_HOST}:{REDIS_PORT}/1",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "CONNECTION_POOL_KWARGS": REDIS_CONNECTION_POOL_KWARGS,
            "SOCKET_CONNECT_TIMEOUT": 5,
            "SOCKET_TIMEOUT": 5,
        },
        "KEY_PREFIX": "arena",
        "TIMEOUT": 300,
    }
}

Change 6.3: Channels Layer Settings

CHANNEL_LAYERS = {
    'default': {
        'BACKEND': 'channels_redis.core.RedisChannelLayer',
        'CONFIG': {
            "hosts": [(REDIS_HOST, int(REDIS_PORT))],
            "capacity": 2000,  # Increased from 1500
            "expiry": 10,
            "group_expiry": 86400,
            "channel_capacity": {
                "http.request": 500,
                "http.response*": 2000,
                "websocket.send*": 2000,
            },
        },
    },
}

7. Nginx Depends On (Update)

File: docker-compose.loadbalanced.yml

Update nginx's depends_on to include new containers:

nginx:
  depends_on:
    - web-1
    - web-2
    - web-3
    - web-4
    - web-5
    - web-6
    - web-7
    - web-8
    - web-9
    - web-10
    - web-11
    - web-12
    - web-13
    - web-14
    - web-15

System-Level Optimizations (On Host Server)

Run these on your server (not in containers):

# Increase file descriptor limits
sudo sysctl -w fs.file-max=200000
echo "fs.file-max = 200000" | sudo tee -a /etc/sysctl.conf

# Increase network buffer sizes
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sudo sysctl -w net.core.netdev_max_backlog=5000

# TCP performance tuning
sudo sysctl -w net.ipv4.tcp_fin_timeout=30
sudo sysctl -w net.ipv4.tcp_keepalive_time=300
sudo sysctl -w net.ipv4.tcp_keepalive_probes=5
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=15

# Make permanent
sudo sysctl -p

Resource Requirements Summary

Current Resources

Component	CPU	RAM	Total
10 Django containers	10 CPU	10 GB	-
Nginx	0.5 CPU	512 MB	-
Redis	0.5 CPU	2 GB	-
PgBouncer	0.5 CPU	256 MB	-
TOTAL	11.5 CPU	~13 GB	-

Required Resources (After Scaling)

Component	CPU	RAM	Total
15 Django containers	30 CPU	30 GB	-
Nginx	2 CPU	1 GB	-
Redis	2 CPU	8 GB	-
PgBouncer	1 CPU	512 MB	-
TOTAL	35 CPU	~40 GB	-

Server Requirements

Minimum Recommended:

CPUs: 40-48 cores (to handle 35 with headroom)
RAM: 48-64 GB (to handle 40 GB with OS overhead)
Network: 10 Gbps
Disk: SSD with 500+ IOPS

Cloud Instance Recommendations:

AWS: c5.12xlarge (48 vCPU, 96 GB RAM)
Google Cloud: c2-standard-60 (60 vCPU, 240 GB RAM)
Azure: F48s v2 (48 vCPU, 96 GB RAM)

Implementation Steps

Step 1: Backup Current Configuration

cd ~/Chat-Arena-Backend
tar -czf backup-$(date +%Y%m%d).tar.gz docker-compose.loadbalanced.yml nginx/ pgbouncer/ backend/arena_backend/settings.py

Step 2: Apply Changes Gradually

First: Increase resources for existing 10 containers
Second: Add 5 new containers (web-11 to web-15)
Third: Scale Redis and PgBouncer
Fourth: Optimize nginx configuration

Step 3: Test Each Stage

# After each change, rebuild and restart
docker compose -f docker-compose.loadbalanced.yml build
docker compose -f docker-compose.loadbalanced.yml up -d

# Wait for health checks
sleep 60

# Check all containers are healthy
docker compose -f docker-compose.loadbalanced.yml ps

# Run load test
cd backend/load_tests
locust -f locustfile_optimized.py \
    --host=https://backend.arena.ai4bharat.org \
    --users 500 \
    --spawn-rate 50 \
    --run-time 5m \
    --headless

Monitoring and Verification

Key Metrics to Monitor

# 1. CPU usage per container
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# 2. Database connections
docker compose -f docker-compose.loadbalanced.yml exec pgbouncer psql -p 6432 -U $DB_USER pgbouncer -c "SHOW POOLS;"

# 3. Redis memory and connections
docker compose -f docker-compose.loadbalanced.yml exec redis redis-cli INFO | grep -E "used_memory_human|connected_clients"

# 4. Nginx request rate
docker compose -f docker-compose.loadbalanced.yml logs nginx | grep -oE "HTTP/[0-9.]+ [0-9]+" | awk '{print $2}' | sort | uniq -c

# 5. Response times
docker compose -f docker-compose.loadbalanced.yml logs nginx | grep "upstream_response_time" | tail -100

Target Metrics After Scaling

Metric	Target	Command
RPS sustained	500-1000	Locust dashboard
P95 response time	<2000ms	Locust stats
CPU usage	<70%	`docker stats`
Memory usage	<80%	`docker stats`
PgBouncer pool usage	<80 connections	`SHOW POOLS`
Redis memory	<6 GB	`INFO memory`

Cost Estimation

Infrastructure Costs (Monthly, Approximate)

Provider	Instance Type	vCPUs	RAM	Monthly Cost
AWS	c5.12xlarge	48	96 GB	~$1,800
Google Cloud	c2-standard-60	60	240 GB	~$2,500
Azure	F48s v2	48	96 GB	~$1,900

Additional costs:

Load balancer: ~$20-50/month
Database (managed PostgreSQL): ~$200-500/month
Storage & bandwidth: ~$100-300/month
Total: ~$2,200-3,500/month

Alternative: Kubernetes Auto-Scaling

If costs are a concern, consider using Kubernetes for auto-scaling:

Start with 10 containers during low traffic
Auto-scale up to 20 containers during peak traffic
Pay only for what you use

Kubernetes benefits:

Automatic horizontal pod autoscaling (HPA)
Rolling updates with zero downtime
Self-healing (auto-restart failed containers)
Resource efficiency (better utilization)

Phase 3 Recommendations (Future)

Once you reach 1000+ RPS:

Caching Layer: Add Varnish or CloudFlare CDN
Read Replicas: Separate read/write database traffic
Message Queue: Use Celery for async tasks (reduce request time)
Database Sharding: Distribute data across multiple databases
Geographic Distribution: Deploy in multiple regions

Quick Start Script

Save this as apply-scaling-changes.sh:

#!/bin/bash
# Apply scaling changes incrementally

echo "Step 1: Increase Gunicorn workers..."
# Update docker-compose.loadbalanced.yml manually

echo "Step 2: Increase container resources..."
# Update deploy.resources in docker-compose.loadbalanced.yml

echo "Step 3: Add new containers..."
# Add web-11 through web-15 definitions

echo "Step 4: Update nginx upstreams..."
# Add new servers to load-balancer.conf

echo "Step 5: Scale PgBouncer and Redis..."
# Update pgbouncer.ini and redis config

echo "Step 6: Rebuild and restart..."
docker compose -f docker-compose.loadbalanced.yml build
docker compose -f docker-compose.loadbalanced.yml up -d

echo "Scaling complete! Monitor with:"
echo "  docker stats"
echo "  docker compose -f docker-compose.loadbalanced.yml ps"

Summary

Key Changes for 500-1000 RPS:

✅ Add 5 containers (10 → 15)
✅ Double Gunicorn workers (4 → 8) and threads (2 → 4)
✅ Double container resources (1 CPU → 2 CPU, 1GB → 2GB)
✅ Scale PgBouncer (25 → 50 pool size)
✅ Scale Redis (2GB → 8GB memory)
✅ Optimize nginx (add workers, increase keepalive)
✅ Update rate limits (already done in previous changes)

Result: System capable of 1,000-2,000 RPS sustained, with room for spikes.

FilesExpand file tree

SCALING_TO_1000_RPS.md

Latest commit

History

SCALING_TO_1000_RPS.md

File metadata and controls

Scaling to 500-1000 Requests Per Second

Current Setup Analysis

Current Capacity

Theoretical RPS Calculation

Target: 500-1000 RPS

Required Changes Summary

Expected Capacity After Changes

Detailed Changes

1. Docker Compose - Increase Containers and Resources

Change 1.1: Update Gunicorn Command (All web containers)

Change 1.2: Increase Container Resources

Change 1.3: Add 5 More Containers (web-11 through web-15)

2. Nginx Load Balancer Configuration

Change 2.1: Add New Upstream Servers

Change 2.2: Optimize Rate Limits for Production High Load

3. Nginx Main Configuration

4. PgBouncer Configuration

Change 4.1: Increase Connection Limits

Change 4.2: Increase PgBouncer Resources

5. Redis Configuration

Change 5.1: Increase Redis Memory and Resources

6. Django Settings Optimization

Change 6.1: Database Connection Pool Settings

Change 6.2: Redis Connection Pool

Change 6.3: Channels Layer Settings

7. Nginx Depends On (Update)

System-Level Optimizations (On Host Server)

Resource Requirements Summary

Current Resources

Required Resources (After Scaling)

Server Requirements

Implementation Steps

Step 1: Backup Current Configuration

Step 2: Apply Changes Gradually

Step 3: Test Each Stage

Monitoring and Verification

Key Metrics to Monitor

Target Metrics After Scaling

Cost Estimation

Infrastructure Costs (Monthly, Approximate)

Alternative: Kubernetes Auto-Scaling

Phase 3 Recommendations (Future)

Quick Start Script

Summary