This document outlines how Jilo Health scales from a 24-hour hackathon MVP to a production system serving 10M+ annual screenings across rural India.
Key Insight: Designed for horizontal scaling from day one—infrastructure grows with demand without architectural changes.
┌─────────────────────────────────────────────────────┐
│ Mobile App (React + TypeScript + Vite) │
│ - Face Capture │
│ - Eye Capture │
│ - Result Display │
│ - Offline-first data storage │
└────────────────────┬────────────────────────────────┘
│ HTTPS
↓
┌─────────────────────────────────────────────────────┐
│ FastAPI Backend (Single Server, t3.xlarge) │
├─────────────────────────────────────────────────────┤
│ ┌───────────────────────────────────────────────┐ │
│ │ API Layer (FastAPI) │ │
│ │ - /api/screen (POST) │ │
│ │ - /api/health (GET) │ │
│ │ - /api/history (GET) │ │
│ └───────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ ML Pipeline │ │
│ │ - Image validation │ │
│ │ - Eye analysis (EfficientNet-B2 + CBAM) │ │
│ │ - Face analysis (ML + blendshapes) │ │
│ │ - Multimodal fusion │ │
│ └───────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Data Layer │ │
│ │ - SQLite database (local file) │ │
│ │ - Image cache (temp storage) │ │
│ └───────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────┘
│
┌────────┴────────┐
↓ ↓
Local Storage (Optional) Cloud
(Screening (S3 backup)
results)
| Metric | Value | Bottleneck |
|---|---|---|
| Requests/second | 1-2 | CPU (inference) |
| Screenings/minute | 30 | GPU not available |
| Latency per screening | 1.8s | ML model inference |
| Model memory | 112 MB | Acceptable |
| Database size/year | 5 GB | SQLite limit: 1 TB |
| Storage/screening | 500 KB | Acceptable |
Single t3.xlarge EC2 instance:
├── Compute: $0.21/hour = $150/month
├── Storage: 100 GB = $10/month
├── Bandwidth: ~1 TB/month at ₹5/GB = ₹5,000/month
├── Monitoring: $50/month
└── TOTAL: ~$200/month = ₹16,000/month
Cost per screening (30 screenings/day × 365):
= ₹16,000 / (30 × 365) = ₹1.46/screening (infrastructure only)
When to Scale: After MVP validation, once deployed to 5-10 clinics
Target: Handle peak of 100+ screenings/day across a region
┌──────────────────────────────────────────────────────────────┐
│ Load Balancer (HTTPS) │
│ (AWS Application Load Balancer) │
└────────┬─────────────────────────────────────────────────┬───┘
│ │
↓ ↓
┌─────────────┐ ┌─────────────┐
│ API Server │ │ API Server │
│ (Auto-scaling group, 2-4 instances) │ (Read replicas) │
│ t3.large × 3 │ t3.large × 2 │
└──────┬──────┘ └────────┬────────┘
│ │
└─────────────┬───────────────────────────────────┘
│ Internal network
↓
┌────────────────────────────────────┐
│ Cache Layer (Redis) │
│ 10 GB in-memory │
│ - Recent screening results │
│ - Model weights cache │
│ - Session management │
└────────────────┬───────────────────┘
│
┌────────────────┴───────────────────┐
↓ ↓
┌───────────────────┐ ┌─────────────────────┐
│ Primary DB │ │ Backup Storage │
│ (PostgreSQL) │ │ (AWS RDS + S3) │
│ 20 GB │ │ Auto-backup every │
│ Multi-AZ │ │ 6 hours │
│ Replication │ │ │
└───────────────────┘ └─────────────────────┘
Horizontal Scaling:
- Add API servers behind load balancer
- Each server runs full ML pipeline independently
- Share only database + cache
- Auto-scale based on queue depth
Database Scaling:
- Migrate from SQLite → PostgreSQL
- Connection pooling (20 connections/server)
- Read replicas for analytics queries
Storage Scaling:
- Local SSD for temp files (fast cleanup)
- S3 for long-term storage (30-day retention)
- CloudFront CDN for result distribution
Infrastructure:
├── Load Balancer: $22/month
├── API Servers (3 × t3.large): $300/month
├── Redis Cache (10 GB): $150/month
├── PostgreSQL RDS (20 GB): $200/month
├── S3 Storage (50 GB): $50/month
├── Bandwidth: 5 TB/month = ₹25,000/month
├── Monitoring/Logging: $150/month
└── TOTAL: ~$1,000/month = ₹80,000/month
Per-screening cost:
= ₹80,000 / (1M/12 months) = ₹0.96/screening
| Metric | Target | How Achieved |
|---|---|---|
| Throughput | 500 screenings/hour | 5 API servers × 100/hour each |
| Latency (p50) | <2s | Local caching, optimized inference |
| Latency (p99) | <10s | Queuing, longer timeout window |
| Availability | 99.5% | Multi-AZ RDS, health checks |
| Concurrent users | 100 | Connection pooling |
When to Scale: After 5+ regions successfully deployed
Target: Handle 1000+ screenings/hour across India
┌─────────────────────────────┐
│ Global DNS (Route 53) │
│ Geographic routing │
└──────────┬────────────────┬─┘
│ │
┌────────────────────────┘ └──────────────┐
│ │
↓ ↓
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ NORTH Region (Delhi) │ │ SOUTH Region (Bangalore) │
├──────────────────────────────┤ ├──────────────────────────────┤
│ Load Balancer │ │ Load Balancer │
│ API Servers (5 × t3.large) │ │ API Servers (5 × t3.large) │
│ Cache (Redis 10 GB) │ │ Cache (Redis 10 GB) │
│ Regional DB (PostgreSQL 50G) │ │ Regional DB (PostgreSQL 50G) │
└──────────────┬───────────────┘ └──────────────┬───────────────┘
│ │
│ Cross-region replication │
├─────────────────────────────────────────┤
│ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
↓ ↓ ↓ ↓ ↓ ↓
[EAST] [WEST] [CENTRAL] [NE]
(Kolkata) (Mumbai) (Indore) (Assam)
Region Region Region Region
Real-time Replication:
Local screening → Write to regional DB (Immediate)
↓
Write to central analytics DB (Near real-time)
↓
Read replicas in other regions (1-5 min lag)
Conflict Resolution:
- Screening ID includes region code (e.g.,
NORTH_20251212_001) - Timestamp-based conflict resolution
- Central analytics DB has final authority
5 Regional Hubs:
├── 5 × Load Balancers: $110/month
├── 25 × API Servers (t3.large): $1,500/month
├── 5 × Redis (10 GB each): $750/month
├── 5 × Regional PostgreSQL (50 GB): $1,000/month
├── Central Analytics DB (100 GB): $300/month
├── S3 + CloudFront: $500/month
├── Network/Transfer (50 TB): ₹250,000/month
├── Monitoring/Logging: $500/month
└── TOTAL: ~$5,500/month (₹4.4L/month)
Per-screening cost:
= ₹440,000 / (50M/12 months) = ₹0.105/screening
When to Scale: Enterprise deployment with government partnerships
Target: Handle 10,000+ screenings/hour nationwide
┌─────────────────────────────────────────────────────────┐
│ Master Control Center (Delhi) │
│ - Operations monitoring │
│ - Algorithm updates │
│ - Clinical oversight │
│ - Data governance │
└─────────────────────────────────────────────────────────┘
│
│ Manages
↓
┌─────────────────────────────────────────────────────────┐
│ Global Load Balancing Layer │
│ (Multi-AZ, Multi-Region Failover) │
└────┬────────┬────────┬────────┬────────┬────────────────┘
│ │ │ │ │
↓ ↓ ↓ ↓ ↓
NORTH SOUTH EAST WEST CENTRAL
Region Region Region Region Region
100 Svr 100 Svr 80 Svr 80 Svr 60 Svr
All regions → Central Data Lake (BigQuery)
→ Analytics Engine (Looker)
→ Real-time Dashboard
Model Serving:
- TensorFlow Serving for optimized inference
- Model versioning with A/B testing
- Automatic model updates every week
- Canary deployments (1% traffic first)
ML Pipeline Optimization:
Input: Face + Eye images
↓
(Parallel processing across GPUs)
├─ GPU-1: EfficientNet inference (100ms)
├─ GPU-2: Blendshape processing (150ms)
├─ GPU-3: Face analysis (200ms)
└─ GPU-4: Result fusion (50ms)
↓
Output: Result (200ms total)
Monitoring & Alerting:
- Real-time model performance tracking
- Automatic retraining when accuracy drops
- Alert on anomalies in predictions
- Daily model health checks
Full production (10 regions, 500 servers):
├── Compute: $8,000/month (amortized)
├── Storage: $2,000/month
├── Database: $4,000/month (multi-region)
├── ML Infrastructure: $3,000/month (GPUs, model serving)
├── Network: ₹500,000/month ($6,000)
├── Monitoring/Security: $2,000/month
├── Support staff: ₹50L/month ($6,000)
└── TOTAL: ~$35,000/month = ₹28L/month
Per-screening cost:
= ₹280,000,000 / (100M screenings) = ₹2.80/screening
Revenue model:
├── Per-screening fee: ₹500/screening
├── Healthcare provider markup: ₹100/screening
├── Insurance partnerships: ₹50/screening
└── Profit/screening: ₹467 (93% margin)
Problem: Rural areas have 2G/3G intermittent connectivity
Solution: On-device ML inference
Option 1: Model Optimization
├── Quantize EfficientNet-B2 (108 MB → 27 MB)
├── Use TensorFlow Lite for mobile
├── Store models on device
├── Inference happens locally
└── Upload only results (10 KB) when connected
Option 2: Progressive Sync
├── Screen patient offline
├── Queue result locally
├── Upload when connected
├── Download latest model when connected
└── Background sync (no user waiting)
Implementation:
# On frontend:
if (navigator.onLine):
sendResultToServer() # Fast path
else:
saveToLocalDB() # Queue for later
showOfflineIndicator()
scheduleSync() # Try every 5 minSingle Clinic Setup:
┌─────────────────────────────┐
│ Health Center (One room) │
├─────────────────────────────┤
│ ┌─────────────────────────┐ │
│ │ Tablet/Old Smartphone │ │
│ │ - Jilo Health App │ │
│ │ - WiFi enabled │ │
│ │ - Charger provided │ │
│ └─────────────────────────┘ │
│ ┌─────────────────────────┐ │
│ │ Local WiFi Router │ │
│ │ (Backup: 4G dongle) │ │
│ └─────────────────────────┘ │
│ ┌─────────────────────────┐ │
│ │ Printed Results (Paper) │ │
│ │ (Fallback if offline) │ │
│ └─────────────────────────┘ │
└─────────────────────────────┘
│
│ Syncs when online
↓
Cloud Backend
Step 1: Clinic Setup (Day 1)
1. Unbox tablet + WiFi router
2. Install Jilo Health app (pre-loaded)
3. Test with 5 sample screenings
4. Train health worker (2 hours)
5. Go live (patient 1 arrives next day)
Step 2: Ongoing Operation (Daily)
Morning:
├── Health worker charges tablet overnight
├── Opens Jilo Health app
├── Checks for model updates
└── Notes: Any alerts from yesterday
Throughout day:
├── Screen patients (2-3 min per patient)
├── Results printed on clinic printer
├── Patient counseled by health worker
└── Results stored on tablet + cloud
Evening:
├── Sync results to cloud
├── Check for urgent cases flagged
├── Prepare referral letters if needed
└── Charge for tomorrow
Step 3: Monthly Support
├── Remote check-in (WhatsApp/phone)
├── Usage statistics reviewed
├── Accuracy feedback collected
├── Model performance monitored
└── Refresher training if needed
Backup Strategy:
Every screening result:
├── Write to local database (instant)
├── Replicate to regional backup (10 min)
├── Copy to central archive (1 hour)
├── Archive to cold storage (24 hours)
Retention:
├── Hot storage: 90 days (fast access)
├── Warm storage: 1 year (1-5 min access)
├── Cold storage: 7 years (regulatory compliance)
Disaster Recovery Time:
- RTO (Recovery Time Objective): 1 hour
- RPO (Recovery Point Objective): 15 minutes
If Primary Region Down:
1. DNS automatically reroutes to backup region (1 min)
2. Backup region reads from replicated database (3 min)
3. Users see minimal disruption (<5 min)
4. Primary region comes back online (when ready)
5. Data re-syncs from backup (30 min)
If Single Clinic Server Down:
1. Offline screening continues (app still works)
2. Results queue locally
3. Clinic WiFi restored or mobile data used
4. Automatic sync when connected
5. No data loss
# terraform/main.tf
module "vpc" {
source = "./modules/vpc"
region = var.aws_region
}
module "api_servers" {
source = "./modules/compute"
instance_count = var.instance_count
instance_type = "t3.large"
subnet_ids = module.vpc.private_subnet_ids
security_groups = [aws_security_group.api.id]
}
module "database" {
source = "./modules/rds"
engine = "postgres"
allocated_storage = var.db_size
multi_az = true
backup_retention_days = 30
}
module "cache" {
source = "./modules/elasticache"
engine = "redis"
node_type = "cache.r6g.xlarge"
num_nodes = var.cache_nodes
}
module "load_balancer" {
source = "./modules/alb"
target_group_arn = module.api_servers.target_group_arn
}# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t jilo-health:${{ github.sha }} .
- name: Push to ECR
run: aws ecr push jilo-health:${{ github.sha }}
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster jilo-health \
--service api \
--force-new-deployment
- name: Run smoke tests
run: pytest tests/smoke/Performance:
├── API latency (p50, p95, p99)
├── Model inference time
├── Database query time
├── Cache hit rate
└── Throughput (screenings/sec)
Quality:
├── Model accuracy per disease
├── Prediction confidence distribution
├── False positive/negative rates
├── Drift detection
└── A/B test results
Business:
├── Screenings per day/region
├── Conversion to clinical action
├── Patient outcomes (6-month)
├── Cost per screening
└── Revenue per region
Real-time Operations Dashboard:
┌──────────────────────────────────────┐
│ Jilo Health - Operations Dashboard │
├──────────────────────────────────────┤
│ Active Screenings: 47 | Queue: 12 │
│ System Load: 45% | Uptime: 99.97% │
│ │
│ Throughput: ▁▂▃▄▅ (per min) │
│ Latency (p99): ▃▂▁▂▃ (seconds) │
│ Cache Hit Rate: ▄▄▄▄▅ (percent) │
│ │
│ By Region: │
│ NORTH: 1,234 screenings today ✅ │
│ SOUTH: 892 screenings today ✅ │
│ EAST: 654 screenings today ✅ │
│ WEST: 438 screenings today ⚠️ │
│ Alert: WEST region latency high │
└──────────────────────────────────────┘
In Transit:
├── TLS 1.3 for all API calls
├── Certificate pinning on mobile app
└── VPN for clinic-to-cloud
At Rest:
├── AES-256 encryption for databases
├── Field-level encryption for PII
├── Separate key management (AWS KMS)
└── Regular key rotation (quarterly)
Access Control:
├── Role-based access (health worker, clinic, region)
├── MFA for admin access
├── Audit logging for all data access
└── Automatic session timeout (15 min)
Current:
- ✅ Privacy by design (image discarded after processing)
- ✅ Data minimization (store only results)
Pre-Production (3 months):
- HIPAA compliance
- GDPR readiness
- ISO 27001 certification
- SOC 2 Type II audit
Production Launch (6 months):
- CDSCO registration
- NDHM integration (India's national health ID)
- State health board approval
- Insurance company partnerships
| Phase | Timeline | Scale | Infrastructure | Cost |
|---|---|---|---|---|
| MVP | 0-3 mo | 100 screenings/day | 1 server | ₹16K/mo |
| Phase 1 | 3-6 mo | 10K screenings/day | 1 region | ₹80K/mo |
| Phase 2 | 6-12 mo | 100K screenings/day | 5 regions | ₹4.4L/mo |
| Phase 3 | 12+ mo | 1M+ screenings/day | All-India | ₹28L/mo |
Jilo Health's architecture demonstrates:
- Built-in scalability: Grows horizontally without redesign
- Cost efficiency: Stays affordable even at 100M screenings
- Reliability: Multi-region redundancy, disaster recovery
- Security: Enterprise-grade data protection
- Operationability: Automated deployments, monitoring
For this hackathon: Judges will see a team that thinks beyond 24 hours—a complete infrastructure vision for rural India.
Document Version: 1.0 Last Updated: December 12, 2025