Skip to content

Latest commit

 

History

History
475 lines (354 loc) · 12 KB

File metadata and controls

475 lines (354 loc) · 12 KB

🚀 AI-CoScientist Connectome Deployment - Pre-Flight Checklist

Generated: 2025-10-26 Status: ✅ READY FOR DEPLOYMENT Target: Connectome Server (8x RTX 3090)


✅ Local Verification Complete

1. Deployment Script

  • ✅ Script exists: scripts/deploy_to_connectome_hybrid.sh
  • ✅ Script is executable (chmod +x already applied)
  • ✅ Script validates: GPU prerequisites, Docker, API keys
  • ✅ Automated features: Password generation, health checks, monitoring setup

2. Environment Configuration

  • .env.local file exists with all API keys
  • ✅ NGC API key configured ⚠️ ROTATE AFTER DEPLOYMENT
  • ✅ OpenAI API key configured ⚠️ ROTATE AFTER DEPLOYMENT
  • ✅ Anthropic API key configured (optional) ⚠️ ROTATE AFTER DEPLOYMENT
  • ✅ GPU assignments: GPU 1 (Nemotron), GPU 5 (Embedder), GPU 6 (Reranker)

3. Docker Configuration

  • docker-compose.connectome.yml ready (11 services)
  • ✅ Infrastructure: postgres, redis, chromadb (3 services)
  • ✅ Nemotron GPU: nemotron-llm, nemo-embedder, nemo-reranker (3 services)
  • ✅ Application: api, celery-worker, celery-beat (3 services)
  • ✅ Monitoring: prometheus, grafana (2 services)
  • ✅ Health checks configured for all critical services

4. Documentation

  • ✅ Detailed guide: DEPLOY_TO_CONNECTOME_NOW.md
  • ✅ Technical docs: claudedocs/NEMOTRON_HYBRID_GUIDE.md
  • ✅ Deployment summary: DEPLOYMENT_COMPLETE_SUMMARY.md

📋 Pre-Deployment Actions Required

Step 0: Verify Connectome Server Access

Before starting deployment, verify:

# 1. Can you SSH to Connectome?
ssh your_username@connectome.server.address

# 2. Check available GPUs
nvidia-smi

# Expected: 8x RTX 3090, 24GB each
# Verify GPUs 1, 5, 6 have ≥20GB, ≥4GB, ≥4GB free respectively

# 3. Check Docker installation
docker --version
docker-compose --version

# 4. Check Docker GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# 5. Verify disk space (need ~50GB for NIM containers + models)
df -h

# 6. Check if AI-CoScientist repo exists
ls -la ~/AI-CoScientist  # or your repo location

Step 1: Transfer Files to Connectome

Option A: Git Pull (RECOMMENDED)

# On Connectome server
cd ~/AI-CoScientist  # or your repo location
git fetch origin
git checkout feature/nemotron-hybrid-integration
git pull origin feature/nemotron-hybrid-integration

# Verify you have the latest deployment script
ls -lh scripts/deploy_to_connectome_hybrid.sh

Option B: Direct File Transfer

# From your local machine
scp .env.local your_username@connectome:/path/to/AI-CoScientist/.env.production
scp scripts/deploy_to_connectome_hybrid.sh your_username@connectome:/path/to/AI-CoScientist/scripts/
scp docker-compose.connectome.yml your_username@connectome:/path/to/AI-CoScientist/

Step 2: Prepare Environment File on Connectome

# On Connectome server
cd ~/AI-CoScientist

# Copy your local .env.local as .env.production
# (or create it manually with the contents below)
cat > .env.production << 'EOF'
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CRITICAL API KEYS
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

NGC_API_KEY=YOUR_NGC_API_KEY_HERE
OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE

# Anthropic (optional - currently has model access issues)
# ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY_HERE

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# GPU CONFIGURATION
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

NEMOTRON_GPU_ID=1
NEMO_EMBEDDER_GPU_ID=5
NEMO_RERANKER_GPU_ID=6

ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO

HYBRID_MODE=true
USE_GPT4_FOR_EVALUATION=true
USE_CLAUDE_FOR_EVALUATION=false  # Disabled until Claude key fixed
USE_NEMOTRON_FOR_SUMMARIZATION=true
USE_NEMOTRON_FOR_EXTRACTION=true

# Ensemble weights (without Claude: 60% GPT-4, 40% Nemotron)
ENSEMBLE_WEIGHT_GPT4=0.60
ENSEMBLE_WEIGHT_CLAUDE=0.0
ENSEMBLE_WEIGHT_NEMOTRON=0.40

NEMOTRON_CONFIDENCE_THRESHOLD=0.75

# OpenAI Configuration
OPENAI_MODEL=gpt-4
OPENAI_TEMPERATURE=0.3
OPENAI_MAX_TOKENS=4096

# Nemotron Configuration
NIM_OPTIMIZATION_PROFILE=throughput
NEMOTRON_BASE_URL=http://nemotron-llm:8000/v1
NEMOTRON_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_TEMPERATURE=0.7
NEMOTRON_MAX_TOKENS=2048

NEMO_EMBEDDER_URL=http://nemo-embedder:8000/v1
NEMO_EMBEDDER_MODEL=nvidia/llama-3.2-nv-embedqa-1b-v2
EMBEDDING_DIMENSION=1024

NEMO_RERANKER_URL=http://nemo-reranker:8000/v1
NEMO_RERANKER_MODEL=nvidia/llama-3.2-nv-rerankqa-1b-v2
RERANKER_TOP_K=5

# Database (will be auto-generated by deployment script)
POSTGRES_USER=postgres
POSTGRES_DB=ai_coscientist
POSTGRES_PORT=5432

CHROMADB_HOST=chromadb
CHROMADB_PORT=8000
CHROMA_TELEMETRY=FALSE

REDIS_HOST=redis
REDIS_PORT=6379

# Application
APP_NAME=AI-CoScientist
APP_VERSION=1.0.0
API_PORT=8080

# Performance
UVICORN_WORKERS=4
CELERY_CONCURRENCY=4

# Monitoring
PROMETHEUS_PORT=9090
GRAFANA_USER=admin
GRAFANA_PORT=3000

# Paths
PAPERS_COLLECTION_DIR=./papers_collection
LOGS_DIR=./logs

CORS_ORIGINS=http://localhost,http://127.0.0.1
EOF

# Secure the file
chmod 600 .env.production

🚀 Deployment Execution

Run the Deployment Script

# On Connectome server
cd ~/AI-CoScientist

# Make sure script is executable (should already be)
chmod +x scripts/deploy_to_connectome_hybrid.sh

# Run deployment (takes 10-15 minutes)
./scripts/deploy_to_connectome_hybrid.sh

What the Script Does

  1. [1/9] Prerequisites Check (~30 seconds)

    • Verifies 8x RTX 3090 GPUs
    • Checks Docker GPU runtime
    • Validates NGC API key
  2. [2/9] Password Generation (~5 seconds)

    • Auto-generates secure PostgreSQL password
    • Auto-generates Redis password
    • Auto-generates Grafana admin password
    • Auto-generates app secret key
  3. [3/9] Docker Image Pull (~10 minutes)

    • Pulls Nemotron NIM containers (large downloads)
    • Pulls infrastructure images (postgres, redis, etc.)
  4. [4/9] Infrastructure Start (~60 seconds)

    • Starts postgres, redis, chromadb
    • Waits for health checks
  5. [5/9] Nemotron GPU Services (~3-5 minutes)

    • Starts nemotron-llm on GPU 1
    • Starts nemo-embedder on GPU 5
    • Starts nemo-reranker on GPU 6
    • Models download and load into VRAM
  6. [6/9] Database Migrations (~30 seconds)

    • Creates database schema
    • Sets up initial tables
  7. [7/9] Strategic Monitoring (~20 seconds)

    • Creates 8 literature sources
    • Sets up 4 alert rules
  8. [8/9] Application Services (~60 seconds)

    • Starts API, celery-worker, celery-beat
    • Starts prometheus, grafana
  9. [9/9] Health Verification (~30 seconds)

    • Checks all 11 services
    • Validates API endpoints
    • Verifies GPU allocations

✅ Post-Deployment Verification

Check Service Status

# On Connectome server
docker-compose -f docker-compose.connectome.yml ps

# Expected: All 11 services showing "Up" and "healthy"

Test API Health

curl http://localhost:8080/api/v1/health

# Expected: {"status":"healthy","services":{...}}

Test Hybrid RAG Status

curl http://localhost:8080/api/v1/hybrid-rag/status

# Expected: Shows GPT-4, Nemotron status, GPU assignments

Monitor GPU Utilization

nvidia-smi -l 1

# Expected:
# GPU 1: ~18GB VRAM used (nemotron-llm)
# GPU 5: ~4GB VRAM used (nemo-embedder)
# GPU 6: ~4GB VRAM used (nemo-reranker)

Test Hybrid Evaluation

curl -X POST http://localhost:8080/api/v1/hybrid-rag/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "paper_text": "Recent advances in deep learning have revolutionized natural language processing. Our novel transformer architecture achieves state-of-the-art results on multiple benchmarks.",
    "section": "abstract",
    "use_ensemble": true
  }'

# Expected: JSON response with scores from GPT-4 and Nemotron

🔐 Security: API Key Rotation

⚠️ CRITICAL: Rotate all API keys after successful deployment

1. Rotate NGC API Key

# Visit: https://org.ngc.nvidia.com/setup/api-key
# Generate new key
# Update .env.production: NGC_API_KEY=YOUR_NGC_API_KEY_HERE

2. Rotate OpenAI API Key

# Visit: https://platform.openai.com/api-keys
# Create new key, delete old one
# Update .env.production: OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE

3. Rotate Anthropic API Key (when ready to use)

# Visit: https://console.anthropic.com/settings/keys
# Create new key
# Update .env.production: ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY_HERE

4. Restart Services After Key Rotation

docker-compose -f docker-compose.connectome.yml restart api celery-worker

📊 Monitoring Dashboard Access

Grafana

URL: http://connectome-server:3000
Username: admin
Password: <check .env.production for GRAFANA_PASSWORD>

Prometheus

URL: http://connectome-server:9090

API Documentation

URL: http://connectome-server:8080/docs

🐛 Troubleshooting Quick Reference

Issue: GPU not detected

# Check nvidia-smi
nvidia-smi

# Test Docker GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# If fails, install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Issue: Nemotron service won't start

# Check logs
docker-compose -f docker-compose.connectome.yml logs nemotron-llm

# Common causes:
# 1. Invalid NGC_API_KEY
# 2. Insufficient GPU memory (need 20GB+ on GPU 1)
# 3. Model download timeout (retry)

# Verify NGC key
echo $NGC_API_KEY

# Restart service
docker-compose -f docker-compose.connectome.yml restart nemotron-llm

Issue: Port already in use

# Check what's using ports
sudo lsof -i :8080  # API
sudo lsof -i :8000  # Nemotron LLM
sudo lsof -i :8001  # Embedder
sudo lsof -i :8002  # Reranker

# Stop conflicting service or change port in .env.production

Issue: Out of disk space

# Check disk usage
df -h

# Clean up Docker
docker system prune -a --volumes  # ⚠️ Will remove all unused data

# Remove old images
docker image prune -a

📈 Expected Resource Usage

GPU Memory (8x RTX 3090, 24GB each)

  • GPU 1: ~18GB / 24GB (75% - Nemotron LLM)
  • GPU 5: ~4GB / 24GB (17% - NeMo Embedder)
  • GPU 6: ~4GB / 24GB (17% - NeMo Reranker)
  • GPUs 0, 2, 3, 4, 7: Free for other workloads

Disk Space

  • Docker images: ~15GB
  • Model cache: ~20GB
  • Database: ~5GB (grows with papers)
  • Total: ~40GB + paper storage

Network Bandwidth

  • Initial deployment: ~10GB download (NIM containers)
  • Ongoing: ~1-2GB/day (paper downloads)

✅ Deployment Complete Checklist

  • All 11 services running (docker-compose ps)
  • API health check passes (curl localhost:8080/api/v1/health)
  • Hybrid RAG status shows GPU assignments
  • GPUs 1, 5, 6 show memory usage in nvidia-smi
  • Grafana accessible at port 3000
  • Test evaluation returns ensemble scores
  • API keys rotated (NGC, OpenAI, Anthropic)
  • .env.production secured (chmod 600)
  • Monitoring dashboards configured
  • Backup strategy planned

📚 Additional Documentation

  • Full Guide: DEPLOY_TO_CONNECTOME_NOW.md
  • Technical Details: claudedocs/NEMOTRON_HYBRID_GUIDE.md
  • Deployment Summary: DEPLOYMENT_COMPLETE_SUMMARY.md
  • API Documentation: http://localhost:8080/docs (after deployment)

🎉 Ready to Deploy!

All pre-flight checks passed. You can now:

  1. SSH to Connectome server
  2. Navigate to AI-CoScientist directory
  3. Run: ./scripts/deploy_to_connectome_hybrid.sh
  4. Wait 10-15 minutes for deployment
  5. Verify all services are healthy
  6. Rotate API keys immediately

Questions or issues? Check DEPLOY_TO_CONNECTOME_NOW.md for detailed troubleshooting.