- Overview
- Components
- Data Flow
- Security Architecture
- Scalability
- High Availability
- Cost Optimization
- Disaster Recovery
- Shared Storage (EFS)
- Scale Potential
GCO (Global Capacity Orchestrator on AWS) is a multi-region Kubernetes platform built on AWS EKS Auto Mode, designed for AI/ML workload orchestration with GPU support.
AWS Global Accelerator
- Single global endpoint for all regions
- Automatic health-based routing
- DDoS protection via AWS Shield
- Reduces latency by routing to nearest healthy region
Each region contains:
VPC Configuration
- 3 Availability Zones
- Public subnets (24-bit CIDR) for ALB
- Private subnets (24-bit CIDR) for EKS nodes
- 2 NAT Gateways for high availability
- VPC endpoints for AWS services
- VPC Flow Logs enabled (CloudWatch Logs, 30-day retention)
EKS Auto Mode Cluster
- Kubernetes 1.35
- Managed control plane
- Control plane logging enabled (API, Audit, Authenticator, Controller Manager, Scheduler)
- Auto-scaling compute via nodepools:
system: Core Kubernetes componentsgeneral-purpose: Standard workloadsgpu-x86: NVIDIA GPU instances (g4dn, g5)gpu-arm: ARM64 GPU instances (g5g)inference: Long-running inference pods (WhenEmpty consolidation)gpu-efa-pool: EFA-enabled instances for distributed training (p4d, p5)
Application Load Balancer
- Internet-facing
- Security group restricts to Global Accelerator IPs only
- Routes traffic to Kubernetes services via Ingress
Regional API Gateway (created by regional stack)
- REST API with regional endpoint
- IAM authentication (SigV4)
- VPC Link to internal NLB for direct service access
- Endpoints:
POST /api/v1/manifests- Submit manifestGET /api/v1/manifests- List manifestsGET /api/v1/health- Health check
Network Load Balancer (Internal)
- Private subnets only
- Routes regional API Gateway → Kubernetes services
- Cross-zone load balancing enabled
Amazon EFS (Elastic File System)
- Shared storage accessible by all pods in the cluster
- Encrypted at rest (AWS KMS) and in transit (TLS)
- Dynamic provisioning via EFS CSI Driver with
basePath: "/dynamic" - Each PVC automatically gets its own access point (UID/GID: 1000, permissions: 755)
- EFS CSI Driver add-on with IRSA for secure access
- PersistentVolumeClaim
gco-shared-storageavailable indefault,gco-jobs, andgco-systemnamespaces
Amazon FSx for Lustre (Optional)
- High-performance parallel file system for ML training workloads
- Encrypted at rest by default (AWS-managed keys)
- Enable via:
gco stacks fsx enable - Static provisioning with pre-created PersistentVolumes bound to each namespace
- PersistentVolumeClaim
gco-fsx-storageavailable indefault,gco-jobs, andgco-systemnamespaces when enabled - Supports S3 data repository integration for seamless data import/export
Global API Gateway (gco-api-gateway stack)
- Single authenticated entry point for all regions
- IAM authentication (SigV4) required for all requests
- Lambda proxy adds secret header for backend validation
- Forwards requests to Global Accelerator
Lambda Proxy
- Retrieves auth secret from Secrets Manager
- Adds X-GCO-Auth-Token header to requests
- Forwards to Global Accelerator endpoint
Namespaces:
gco-system: All platform services (health monitor, manifest processor) run heregco-jobs: User workloads submitted via the API are deployed here
Health Monitor Service
- 2 replicas for high availability
- Pod anti-affinity spreads replicas across nodes/AZs
- PodDisruptionBudget ensures at least 1 replica during disruptions
- Monitors cluster and workload health
- Exposes
/healthzand/readyzendpoints - Reports metrics to CloudWatch
Manifest Processor Service
- 3 replicas for high throughput
- Pod anti-affinity spreads replicas across nodes/AZs
- PodDisruptionBudget ensures at least 2 replicas during disruptions
- Validates and processes manifest submissions
- Queues manifests for application
- Tracks manifest lifecycle
Service Account & RBAC
gco-service-account: Used by all platform servicesgco-cluster-role: Cluster-wide permissions- Least-privilege access model
kubectl Applier Lambda
- Python 3.14 runtime
- Runs in VPC private subnets
- Security group allows access to EKS cluster
- IAM role with EKS cluster admin access
- Applies Kubernetes manifests during stack deployment
Function Flow:
- CloudFormation triggers Lambda via Custom Resource
- Lambda generates EKS authentication token
- Connects to EKS private endpoint
- Applies manifests from embedded directory
- Reports success/failure to CloudFormation
User → API Gateway (IAM Auth) → Lambda Proxy → Global Accelerator
→ Regional ALB → Kubernetes Ingress → Manifest Processor Pod
→ Kubernetes API → Workload Scheduled → Node Provisioned
User Request (SigV4 signed) → API Gateway (IAM Auth)
→ Validates SigV4 signature and IAM permissions
→ Lambda Proxy retrieves secret from Secrets Manager
→ Lambda adds X-GCO-Auth-Token header
→ Request forwarded to Global Accelerator
→ Regional ALB validates secret header
→ Manifest Processor processes request
Pod Pending → Karpenter detects unschedulable pod
→ Evaluates nodepool requirements
→ Provisions EC2 instance matching requirements
→ Joins instance to cluster
→ Pod scheduled on new node
GCO is validated against multiple compliance frameworks using CDK-nag:
- AWS Solutions: Best practices for AWS architectures
- HIPAA Security: Healthcare compliance requirements
- NIST 800-53 Rev 5: Federal security controls
- PCI DSS 3.2.1: Payment card industry standards
- Serverless: Best practices for serverless architectures
All compliance checks run during cdk synth and deployment. Suppressions are documented in gco/stacks/nag_suppressions.py with justifications for each exception.
Layers of Defense:
- Global Accelerator (DDoS protection)
- ALB Security Group (Global Accelerator IPs only)
- VPC isolation (private subnets)
- Security groups (least-privilege)
EKS Cluster Security:
- Private endpoint enabled
- Public endpoint enabled (for kubectl access)
- Cluster security group controls access
- Pod security standards enforced
Principle of Least Privilege:
- Lambda Role: EKS describe + cluster admin access entry
- Service Account: Kubernetes RBAC-controlled
- API Gateway: IAM authentication required
- Users: Explicit access entries required
Access Entry Model:
- No aws-auth ConfigMap
- IAM principals explicitly granted access
- Policy-based permissions (AmazonEKSClusterAdminPolicy)
- Audit trail via CloudTrail
- At Rest: EBS volumes and EFS encrypted with AWS KMS
- In Transit: TLS 1.2+ for all connections (including EFS mounts)
- Secrets: Kubernetes secrets encrypted in etcd
- Logs: CloudWatch Logs encrypted
Application Layer:
- Health Monitor: 2-10 replicas (HPA)
- Manifest Processor: 3-20 replicas (HPA)
- User workloads: Unlimited (within nodepool limits)
Compute Layer:
- EKS Auto Mode automatically provisions nodes
- Nodepool limits configurable per instance type
- Supports 1000s of pods per cluster
Cluster Limits:
- Control plane: Fully managed by AWS
- Nodes: Up to 100,000 per cluster (EKS limit)
- Pods: 110 per node (default)
- Deploy to additional regions independently
- Global Accelerator automatically includes new regions
- No cross-region dependencies
- Multi-AZ: All components span 3 AZs
- NAT Gateways: 2 for redundancy
- ALB: Multi-AZ by default
- EKS Control Plane: Multi-AZ managed by AWS
- Multiple Replicas: All services have 2+ replicas
- Pod Anti-Affinity: Spreads pods across nodes (preferred scheduling)
- Topology Spread Constraints: Distributes pods across availability zones
- Pod Disruption Budgets: Ensures minimum availability during voluntary disruptions
- Health Monitor: minAvailable=1
- Manifest Processor: minAvailable=2
- Health Checks: Liveness, readiness, and startup probes
- Graceful Shutdown: preStop hooks allow in-flight requests to complete
- Rolling Updates: Zero-downtime deployments with maxUnavailable=0
- Auto-Healing: Kubernetes restarts failed pods
- Multi-Region: Deploy to 2+ regions
- Global Accelerator: Automatic failover
- Health-Based Routing: Routes away from unhealthy regions
- EKS Auto Mode: Pay only for provisioned nodes
- Karpenter: Efficient bin-packing
- Spot Instances: Supported for fault-tolerant workloads
- ARM Instances: 20% cost savings for compatible workloads
- VPC Endpoints: Reduce NAT Gateway costs
- Private Subnets: Minimize data transfer
- Regional Deployment: Keep traffic within region
- EBS: gp3 volumes (cost-effective)
- EFS: Pay-per-use elastic storage (no pre-provisioning)
- ECR: Lifecycle policies for image cleanup
- Logs: Retention policies to control costs
- EKS: Control plane backed up by AWS
- Manifests: Stored in Lambda package (version controlled)
- Application State: User responsibility
Regional Failure:
- Global Accelerator routes to healthy region
- No manual intervention required
- RTO: < 1 minute
Cluster Failure:
- Redeploy stack:
cdk deploy gco-REGION - Manifests automatically reapplied
- RTO: under 1 hour
Complete Failure:
- Deploy to new region
- Update Global Accelerator
- RTO: under 1 hour
Amazon EFS provides shared, persistent storage for all pods in the cluster. This enables:
- Job outputs that persist after pod termination
- Data sharing between pods and jobs
- Checkpoint storage for ML training workloads
┌─────────────────────────────────────────────────────────┐
│ EFS File System │
│ (Encrypted at rest) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Access Point: /gco-jobs │ │
│ │ - UID/GID: 1000 │ │
│ │ - Permissions: 755 │ │
│ └─────────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────┘
│ TLS (encryption in transit)
│
┌────────────────────▼────────────────────────────────────┐
│ EFS CSI Driver (IRSA) │
│ - Runs in kube-system namespace │
│ - Uses IAM role for secure access │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ PersistentVolumeClaim │
│ - Name: gco-shared-storage │
│ - Available in: default, gco-jobs, gco-system │
│ - Access Mode: ReadWriteMany │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Pods │
│ - Mount at /outputs or custom path │
│ - Read/write access for all pods │
└─────────────────────────────────────────────────────────┘
Jobs can mount the shared storage to persist outputs:
spec:
containers:
- name: worker
volumeMounts:
- name: shared-storage
mountPath: /outputs
volumes:
- name: shared-storage
persistentVolumeClaim:
claimName: gco-shared-storageSee examples/efs-output-job.yaml for a complete example.
- Encryption at Rest: AWS KMS managed key
- Encryption in Transit: TLS via EFS CSI driver
- Access Control: File system policy restricts to VPC
- IRSA: EFS CSI driver uses IAM role (no static credentials)
This section provides a theoretical upper bound for GCO's throughput when deployed globally. These numbers illustrate why a multi-region orchestration platform matters for large-scale AI/ML workloads.
| Parameter | Value | Notes |
|---|---|---|
| AWS Regions | 34 | Commercial regions (38 total minus 2 for GovCloud and 2 for Sovereign) |
| Nodes per cluster | 100,000 | EKS hard limit |
| GPUs per g5.xlarge | 1 | Single A10G GPU |
| GPUs per g5.48xlarge | 8 | Eight A10G GPUs |
Conservative estimate (g5.xlarge, 1 GPU each):
34 regions × 100,000 nodes × 1 GPU = 3,400,000 concurrent GPU jobs
High-density estimate (g5.48xlarge, 8 GPUs each):
34 regions × 100,000 nodes × 8 GPUs = 27,200,000 concurrent GPUs
34 regions × 1,000 manifests/sec = 34,000 job submissions/second
| Metric | Single Region | Full Global (34 Regions) |
|---|---|---|
| Max GPU nodes | 100,000 | 3,400,000 |
| Job submission rate | 1,000/sec | 34,000/sec |
These are theoretical maximums. Actual limits depend on:
- AWS Service Quotas: Default limits are much lower; requires quota increases
- EC2 Capacity: GPU instance availability varies by region and time
- Cost: Running at full scale would cost millions per hour
- Nodepool Limits: Current config limits GPU pools to 1,000-1,500 vCPUs per region
Current nodepool limits (per region):
gpu-x86-pool: 1,000 vCPUs / 4,000Gi memory (~166 g5.xlarge nodes)gpu-arm-pool: 500 vCPUs / 2,000Gi memory (125 g5g.xlarge nodes)
To increase throughput, increase nodepool limits in the manifests and request AWS quota increases.
Traditional single-cluster approaches hit scaling walls quickly. GCO's multi-region architecture means:
- No single point of failure - One region's issues don't affect others
- Linear horizontal scaling - Add regions to add capacity
- Geographic distribution - Run jobs closer to data sources
- Capacity arbitrage - Route to regions with available GPU capacity