Generated: 2025-10-26 Status: ✅ READY FOR DEPLOYMENT Target: Connectome Server (8x RTX 3090)
- ✅ Script exists:
scripts/deploy_to_connectome_hybrid.sh - ✅ Script is executable (chmod +x already applied)
- ✅ Script validates: GPU prerequisites, Docker, API keys
- ✅ Automated features: Password generation, health checks, monitoring setup
- ✅
.env.localfile exists with all API keys - ✅ NGC API key configured
⚠️ ROTATE AFTER DEPLOYMENT - ✅ OpenAI API key configured
⚠️ ROTATE AFTER DEPLOYMENT - ✅ Anthropic API key configured (optional)
⚠️ ROTATE AFTER DEPLOYMENT - ✅ GPU assignments: GPU 1 (Nemotron), GPU 5 (Embedder), GPU 6 (Reranker)
- ✅
docker-compose.connectome.ymlready (11 services) - ✅ Infrastructure: postgres, redis, chromadb (3 services)
- ✅ Nemotron GPU: nemotron-llm, nemo-embedder, nemo-reranker (3 services)
- ✅ Application: api, celery-worker, celery-beat (3 services)
- ✅ Monitoring: prometheus, grafana (2 services)
- ✅ Health checks configured for all critical services
- ✅ Detailed guide:
DEPLOY_TO_CONNECTOME_NOW.md - ✅ Technical docs:
claudedocs/NEMOTRON_HYBRID_GUIDE.md - ✅ Deployment summary:
DEPLOYMENT_COMPLETE_SUMMARY.md
Before starting deployment, verify:
# 1. Can you SSH to Connectome?
ssh your_username@connectome.server.address
# 2. Check available GPUs
nvidia-smi
# Expected: 8x RTX 3090, 24GB each
# Verify GPUs 1, 5, 6 have ≥20GB, ≥4GB, ≥4GB free respectively
# 3. Check Docker installation
docker --version
docker-compose --version
# 4. Check Docker GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# 5. Verify disk space (need ~50GB for NIM containers + models)
df -h
# 6. Check if AI-CoScientist repo exists
ls -la ~/AI-CoScientist # or your repo locationOption A: Git Pull (RECOMMENDED)
# On Connectome server
cd ~/AI-CoScientist # or your repo location
git fetch origin
git checkout feature/nemotron-hybrid-integration
git pull origin feature/nemotron-hybrid-integration
# Verify you have the latest deployment script
ls -lh scripts/deploy_to_connectome_hybrid.shOption B: Direct File Transfer
# From your local machine
scp .env.local your_username@connectome:/path/to/AI-CoScientist/.env.production
scp scripts/deploy_to_connectome_hybrid.sh your_username@connectome:/path/to/AI-CoScientist/scripts/
scp docker-compose.connectome.yml your_username@connectome:/path/to/AI-CoScientist/# On Connectome server
cd ~/AI-CoScientist
# Copy your local .env.local as .env.production
# (or create it manually with the contents below)
cat > .env.production << 'EOF'
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CRITICAL API KEYS
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
NGC_API_KEY=YOUR_NGC_API_KEY_HERE
OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE
# Anthropic (optional - currently has model access issues)
# ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY_HERE
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# GPU CONFIGURATION
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
NEMOTRON_GPU_ID=1
NEMO_EMBEDDER_GPU_ID=5
NEMO_RERANKER_GPU_ID=6
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO
HYBRID_MODE=true
USE_GPT4_FOR_EVALUATION=true
USE_CLAUDE_FOR_EVALUATION=false # Disabled until Claude key fixed
USE_NEMOTRON_FOR_SUMMARIZATION=true
USE_NEMOTRON_FOR_EXTRACTION=true
# Ensemble weights (without Claude: 60% GPT-4, 40% Nemotron)
ENSEMBLE_WEIGHT_GPT4=0.60
ENSEMBLE_WEIGHT_CLAUDE=0.0
ENSEMBLE_WEIGHT_NEMOTRON=0.40
NEMOTRON_CONFIDENCE_THRESHOLD=0.75
# OpenAI Configuration
OPENAI_MODEL=gpt-4
OPENAI_TEMPERATURE=0.3
OPENAI_MAX_TOKENS=4096
# Nemotron Configuration
NIM_OPTIMIZATION_PROFILE=throughput
NEMOTRON_BASE_URL=http://nemotron-llm:8000/v1
NEMOTRON_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_TEMPERATURE=0.7
NEMOTRON_MAX_TOKENS=2048
NEMO_EMBEDDER_URL=http://nemo-embedder:8000/v1
NEMO_EMBEDDER_MODEL=nvidia/llama-3.2-nv-embedqa-1b-v2
EMBEDDING_DIMENSION=1024
NEMO_RERANKER_URL=http://nemo-reranker:8000/v1
NEMO_RERANKER_MODEL=nvidia/llama-3.2-nv-rerankqa-1b-v2
RERANKER_TOP_K=5
# Database (will be auto-generated by deployment script)
POSTGRES_USER=postgres
POSTGRES_DB=ai_coscientist
POSTGRES_PORT=5432
CHROMADB_HOST=chromadb
CHROMADB_PORT=8000
CHROMA_TELEMETRY=FALSE
REDIS_HOST=redis
REDIS_PORT=6379
# Application
APP_NAME=AI-CoScientist
APP_VERSION=1.0.0
API_PORT=8080
# Performance
UVICORN_WORKERS=4
CELERY_CONCURRENCY=4
# Monitoring
PROMETHEUS_PORT=9090
GRAFANA_USER=admin
GRAFANA_PORT=3000
# Paths
PAPERS_COLLECTION_DIR=./papers_collection
LOGS_DIR=./logs
CORS_ORIGINS=http://localhost,http://127.0.0.1
EOF
# Secure the file
chmod 600 .env.production# On Connectome server
cd ~/AI-CoScientist
# Make sure script is executable (should already be)
chmod +x scripts/deploy_to_connectome_hybrid.sh
# Run deployment (takes 10-15 minutes)
./scripts/deploy_to_connectome_hybrid.sh-
[1/9] Prerequisites Check (~30 seconds)
- Verifies 8x RTX 3090 GPUs
- Checks Docker GPU runtime
- Validates NGC API key
-
[2/9] Password Generation (~5 seconds)
- Auto-generates secure PostgreSQL password
- Auto-generates Redis password
- Auto-generates Grafana admin password
- Auto-generates app secret key
-
[3/9] Docker Image Pull (~10 minutes)
- Pulls Nemotron NIM containers (large downloads)
- Pulls infrastructure images (postgres, redis, etc.)
-
[4/9] Infrastructure Start (~60 seconds)
- Starts postgres, redis, chromadb
- Waits for health checks
-
[5/9] Nemotron GPU Services (~3-5 minutes)
- Starts nemotron-llm on GPU 1
- Starts nemo-embedder on GPU 5
- Starts nemo-reranker on GPU 6
- Models download and load into VRAM
-
[6/9] Database Migrations (~30 seconds)
- Creates database schema
- Sets up initial tables
-
[7/9] Strategic Monitoring (~20 seconds)
- Creates 8 literature sources
- Sets up 4 alert rules
-
[8/9] Application Services (~60 seconds)
- Starts API, celery-worker, celery-beat
- Starts prometheus, grafana
-
[9/9] Health Verification (~30 seconds)
- Checks all 11 services
- Validates API endpoints
- Verifies GPU allocations
# On Connectome server
docker-compose -f docker-compose.connectome.yml ps
# Expected: All 11 services showing "Up" and "healthy"curl http://localhost:8080/api/v1/health
# Expected: {"status":"healthy","services":{...}}curl http://localhost:8080/api/v1/hybrid-rag/status
# Expected: Shows GPT-4, Nemotron status, GPU assignmentsnvidia-smi -l 1
# Expected:
# GPU 1: ~18GB VRAM used (nemotron-llm)
# GPU 5: ~4GB VRAM used (nemo-embedder)
# GPU 6: ~4GB VRAM used (nemo-reranker)curl -X POST http://localhost:8080/api/v1/hybrid-rag/evaluate \
-H "Content-Type: application/json" \
-d '{
"paper_text": "Recent advances in deep learning have revolutionized natural language processing. Our novel transformer architecture achieves state-of-the-art results on multiple benchmarks.",
"section": "abstract",
"use_ensemble": true
}'
# Expected: JSON response with scores from GPT-4 and Nemotron# Visit: https://org.ngc.nvidia.com/setup/api-key
# Generate new key
# Update .env.production: NGC_API_KEY=YOUR_NGC_API_KEY_HERE# Visit: https://platform.openai.com/api-keys
# Create new key, delete old one
# Update .env.production: OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE# Visit: https://console.anthropic.com/settings/keys
# Create new key
# Update .env.production: ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY_HEREdocker-compose -f docker-compose.connectome.yml restart api celery-workerURL: http://connectome-server:3000
Username: admin
Password: <check .env.production for GRAFANA_PASSWORD>
URL: http://connectome-server:9090
URL: http://connectome-server:8080/docs
# Check nvidia-smi
nvidia-smi
# Test Docker GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# If fails, install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker# Check logs
docker-compose -f docker-compose.connectome.yml logs nemotron-llm
# Common causes:
# 1. Invalid NGC_API_KEY
# 2. Insufficient GPU memory (need 20GB+ on GPU 1)
# 3. Model download timeout (retry)
# Verify NGC key
echo $NGC_API_KEY
# Restart service
docker-compose -f docker-compose.connectome.yml restart nemotron-llm# Check what's using ports
sudo lsof -i :8080 # API
sudo lsof -i :8000 # Nemotron LLM
sudo lsof -i :8001 # Embedder
sudo lsof -i :8002 # Reranker
# Stop conflicting service or change port in .env.production# Check disk usage
df -h
# Clean up Docker
docker system prune -a --volumes # ⚠️ Will remove all unused data
# Remove old images
docker image prune -a- GPU 1: ~18GB / 24GB (75% - Nemotron LLM)
- GPU 5: ~4GB / 24GB (17% - NeMo Embedder)
- GPU 6: ~4GB / 24GB (17% - NeMo Reranker)
- GPUs 0, 2, 3, 4, 7: Free for other workloads
- Docker images: ~15GB
- Model cache: ~20GB
- Database: ~5GB (grows with papers)
- Total: ~40GB + paper storage
- Initial deployment: ~10GB download (NIM containers)
- Ongoing: ~1-2GB/day (paper downloads)
- All 11 services running (
docker-compose ps) - API health check passes (
curl localhost:8080/api/v1/health) - Hybrid RAG status shows GPU assignments
- GPUs 1, 5, 6 show memory usage in nvidia-smi
- Grafana accessible at port 3000
- Test evaluation returns ensemble scores
- API keys rotated (NGC, OpenAI, Anthropic)
- .env.production secured (chmod 600)
- Monitoring dashboards configured
- Backup strategy planned
- Full Guide:
DEPLOY_TO_CONNECTOME_NOW.md - Technical Details:
claudedocs/NEMOTRON_HYBRID_GUIDE.md - Deployment Summary:
DEPLOYMENT_COMPLETE_SUMMARY.md - API Documentation: http://localhost:8080/docs (after deployment)
All pre-flight checks passed. You can now:
- SSH to Connectome server
- Navigate to AI-CoScientist directory
- Run:
./scripts/deploy_to_connectome_hybrid.sh - Wait 10-15 minutes for deployment
- Verify all services are healthy
- Rotate API keys immediately
Questions or issues? Check DEPLOY_TO_CONNECTOME_NOW.md for detailed troubleshooting.