One API. Every Accelerator. Any Region.
Multi-region compute orchestration for AWS — NVIDIA GPUs, Trainium, Inferentia, and CPU (amd64 + arm64 / Graviton) — with capacity-aware scheduling, spot fallback, and multi-region autoscaling inference endpoints with automatic failover and latency-aware routing, all from a single REST API and CLI.
Deploy everything and tear it all down with one command each:
gco stacks deploy-all -y # stand up every region defined in cdk.json
gco stacks destroy-all -y # destroy every stack across every region — no orphaned resourcesWhat it does. Spins up EKS Auto Mode clusters across AWS regions, wired together with Global Accelerator for latency-aware anycast routing and automatic failover. Submit Kubernetes manifests via a single REST API or CLI — GCO handles capacity-aware scheduling, spot fallback, multi-region autoscaling inference endpoints, and output persistence.
Who it's for. Teams running accelerated workloads — LLM training and inference, batch ML, HPC, and general CPU jobs — that need multi-region redundancy, automatic capacity discovery, and IAM-based access without per-cluster kubeconfig distribution. Pre-wired nodepools for NVIDIA GPUs (g4dn, g5, and ARM64 g5g), AWS Trainium, AWS Inferentia, and general-purpose CPU on both amd64 and arm64 / Graviton.
Why it's different. Capacity-aware routing across regions out of the box, full-stack observability (CloudWatch dashboards, alarms, SNS), and a CDK app validated across 20+ config matrix combinations in CI.
Recommended: run everything from the dev container. GCO pins exact versions of a lot of Python packages (CDK, AWS SDKs, FastAPI, mypy, Ruff, etc.), and installing them on top of an existing Python environment is the most common source of "it doesn't install" reports. The dev container ships a fully resolved environment (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI, all Python deps) so you skip the whole problem.
git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws
docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-w /workspace \
gco-devThe docker.sock mount lets gco stacks deploy-all bundle Lambda assets through your host Docker daemon. See Prerequisites for Colima/Finch socket paths and the security note about host-socket pass-through.
Prefer to install on your host? (advanced)
This path requires Python 3.10+ and works best in a fresh virtual environment. With a lot of pinned dependencies, mixing GCO into an existing environment will frequently produce resolver conflicts — use a clean venv or pipx.
git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws && pipx install -e .See the Quick Start for the full install + first-job walkthrough, or docs/CLI.md for every CLI command.
💡 New to the codebase? GCO ships with an MCP server exposing 44 tools that index the whole project — docs, examples, source code, K8s manifests, scripts. Connect it to an AI-powered IDE (like Kiro) and ask in natural language: "How does region recommendation work?", "Walk me through the inference deployment flow". See mcp/README.md.
Table of contents
Running GPU workloads at scale is hard. You need to find regions with available capacity, provision clusters, handle authentication, deal with failover, and persist outputs after pods terminate. GCO solves all of this with a single deployable platform.
| Challenge | Traditional Approach | With GCO |
|---|---|---|
| GPU availability | Manually check each region | Auto-routes to available capacity |
| Node provisioning | Pre-provision or wait for scaling | EKS Auto Mode provisions on-demand |
| Multi-region ops | Manage clusters separately | Single API, automatic routing |
| Authentication | Configure per-cluster access | IAM-based, uses existing AWS credentials |
| Job outputs | Lost when pods terminate | Persisted to EFS/FSx storage |
| Inference serving | Deploy and manage per-region | Deploy once, serve globally |
| Failover | Manual intervention required | Automatic via Global Accelerator |
When to use GCO:
- You need to run GPU workloads (training, inference, batch processing)
- You want to deploy inference endpoints across multiple regions with a single command
- You want multi-region redundancy without managing multiple clusters
- You prefer IAM authentication over kubeconfig management
- You need job outputs to persist after completion
The fastest, most reliable path is the dev container — it sidesteps the dependency-conflict issues that come with installing GCO's pinned Python packages on top of your existing Python environment.
# Build the dev container (Python, Node.js, CDK, kubectl, AWS CLI all pinned & pre-installed)
docker build -f Dockerfile.dev -t gco-dev .
# Drop into a shell with the gco CLI already installed
docker run -it --rm \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-w /workspace \
gco-dev
# From inside the container — deploy everything (CDK bootstrap runs automatically)
gco stacks deploy-all -yIf you'd rather install on your host, use a clean virtual environment or pipx — see the Prerequisites and QUICKSTART.md for the details and known caveats.
Optional: configure kubectl access (requires
PUBLIC_AND_PRIVATEendpoint mode). The default endpoint mode isPRIVATE— see docs/CUSTOMIZATION.md for details. Most users don't need this; submit jobs via SQS or API Gateway instead.
# Check GPU capacity
gco capacity check --instance-type g4dn.xlarge --region us-east-1
# Submit a job (pick your preferred method)
gco jobs submit-sqs examples/simple-job.yaml --region us-east-1 # via SQS (recommended)
gco queue submit examples/simple-job.yaml --region us-east-1 # via Global DynamoDB queue
gco jobs submit examples/simple-job.yaml -n gco-jobs # via API Gateway
gco jobs submit-direct examples/simple-job.yaml -r us-east-1 # via kubectl
# Check status and get logs
gco jobs list --all-regions
gco jobs logs hello-gco -n gco-jobs -r us-east-1gco inference deploy my-llm -i vllm/vllm-openai:v0.20.1 --gpu-count 1
gco inference status my-llm
gco inference scale my-llm --replicas 3See the Quick Start Guide for the full step-by-step walkthrough, or the CLI Reference for all available commands.
Regenerate this diagram and every per-stack view on demand with python diagrams/infra_diagrams/generate.py — it synthesises the current CDK app through AWS PDK cdk-graph so the diagrams never drift from the source. See diagrams/infra_diagrams/README.md for per-stack flags (--stack global|api-gateway|regional|regional-api|monitoring|analytics|all). Flowcharts of the code itself (Lambda handlers, CLI commands) live alongside them under diagrams/code_diagrams/.
The regional stack can be deployed to any AWS region. Add or remove regions by editing the
deployment_regions.regionalarray incdk.json.
┌───────────────────────────────────────────────────┐
│ User Request │
│ (AWS SigV4 Authentication) │
└────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ API Gateway (Edge-Optimized, Global) │
│ ✓ IAM Authentication Required │
│ ✓ CloudFront Edge Caching │
└────────────────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ AWS Global Accelerator │
│ Routes to nearest healthy region │
└────────────────────┬──────────────────────────────┘
│
┌────────────┼────────────┬────────────┐
│ │ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│us-east-1│ │us-west-2│ │eu-west-1│ │ More │
│ ALB │ │ ALB │ │ ALB │ │ Regions │
│(GA IPs │ │(GA IPs │ │(GA IPs │ |(GA IPs │
│ only) │ │ only) │ │ only) │ | only) │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
┌────▼────────────▼────────────▼────────────▼────┐
│ EKS Auto Mode Cluster (per region) │
│ ┌─────────────────────────────────────────┐ │
│ │ Nodepools: System, General, GPU (x86 │ │
│ │ + ARM), Inference │ │
│ ├─────────────────────────────────────────┤ │
│ │ Services: Health Monitor, Manifest │ │
│ │ Processor, Inference Monitor │ │
│ ├─────────────────────────────────────────┤ │
│ │ Storage: EFS (shared) + FSx (optional) │ │
│ └─────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
Five layers protect every request:
- IAM Authentication — API Gateway validates AWS credentials (SigV4)
- Secret Header — Lambda injects a rotating token from Secrets Manager
- IP Restriction — ALBs only accept Global Accelerator IPs
- Header Validation — Backend services verify the secret token
- IRSA — Pods use IAM roles for AWS access (no static credentials)
Request flow: User → API Gateway (SigV4) → Lambda (adds secret) → Global Accelerator
→ ALB (GA IPs only) → Services (validate secret)
For private clusters, Regional API Gateways provide direct VPC access without public ALB exposure.
See Architecture Details for the full deep dive.
- EKS Auto Mode with automatic node provisioning — no pre-scaling needed
- GPU support for x86_64 (g4dn, g5) and ARM64 (g5g) via Karpenter nodepools
- Multiple submission methods: API Gateway, SQS queues, DynamoDB job queue, or direct kubectl
- Job pipelines (DAGs): Multi-step ML pipelines with dependency ordering and failure handling
- Helm-managed ecosystem: KEDA, Volcano, KubeRay, Kueue, GPU Operator, DRA, and more — configurable via
cdk.json
- Multi-region inference: Deploy endpoints (vLLM, TGI, Triton, TorchServe, SGLang) across regions with a single command
- Canary deployments: A/B test new model versions with weighted traffic routing
- Model weight management: Central S3 bucket with KMS encryption, automatic sync to each region
- Spot instance support: Run inference on spot GPUs for significant cost savings
- Autoscaling: HPA-based scaling with CPU/memory metrics
- Global Accelerator: Single anycast endpoint with automatic failover
- IAM authentication: SigV4 at the API Gateway — no kubeconfig distribution
- Compliance validated: CDK-nag checks for AWS Solutions, HIPAA, NIST 800-53, PCI DSS
- Network policies: Default-deny with explicit allow rules for all service communication
- EFA support: Optional Elastic Fabric Adapter for high-bandwidth distributed training and NIXL-based inference (toggle on/off)
- EFS: Shared elastic storage for job outputs that persist after pod termination
- FSx for Lustre: Optional high-performance parallel file system for ML training (toggle on/off)
- Valkey cache: Optional serverless key-value cache for prompt caching and session state
- Aurora pgvector: Optional serverless vector database for RAG, semantic search, and embedding storage
- Cost visibility: Track spend by service, region, and workload via Cost Explorer integration
- Auto-bootstrap: CDK bootstrap runs automatically for new regions during deploy
- Multi-region monitoring: CloudWatch dashboards, alarms, and SNS alerts across all regions
- ML & Analytics Environment: Optional SageMaker Studio domain + EMR Serverless + Cognito user pool for interactive notebook analytics, with an always-on
Cluster_Shared_Bucketthat all cluster jobs can read and write. Off by default — enable withgco analytics enable. See Analytics Guide.
New to GCO? Start here:
| Your Goal | Read This |
|---|---|
| Understand what GCO does | Core Concepts |
| Get running in under 60 minutes | Quick Start Guide |
| Learn the architecture | Architecture Details |
Day-to-day operations:
| Your Goal | Read This |
|---|---|
| CLI commands and usage | CLI Reference |
| Deploy inference endpoints | Inference Guide |
| Use the REST API directly | API Reference |
| Fix issues | Troubleshooting |
| Respond to incidents | Operational Runbooks |
| Run interactive notebook analytics | Analytics Guide |
Customization and development:
| Your Goal | Read This |
|---|---|
| Add regions, tune nodepools, enable FSx | Customization Guide |
| Choose a scheduler for your workload | Schedulers & Orchestrators |
| Configure the SQS queue processor | Queue Processor Config |
| Contribute to the project | Contributing |
| API client examples (Python, curl, AWS CLI) | Client Examples |
| IAM policy templates | IAM Policies |
| Presentation slides and demo scripts | Demo Starter Kit |
Recommended path — dev container only:
- AWS CLI configured with appropriate credentials (or
~/.awsto mount in) - Docker (or Finch / Colima) — that's it. The container ships Python 3.14, Node.js 24, CDK, kubectl, and AWS CLI at pinned versions.
docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm -v ~/.aws:/root/.aws:ro -v $(pwd):/workspace -w /workspace gco-devFor gco stacks deploy-all, cdk deploy needs to run Docker to bundle Lambda assets. Mount the host Docker socket so the container's CLI talks to your host daemon (works with Docker Desktop on macOS/Windows, with Docker on Linux, and with Colima on macOS — see Dockerfile.dev for Colima-specific socket paths):
docker run --rm -it \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-w /workspace \
gco-dev gco stacks deploy-all -yThis is host-socket pass-through, not true Docker-in-Docker. Anyone with access to the container has root-equivalent access to the host Docker daemon, so keep the container on a trusted host.
Host install path (advanced):
- AWS CLI configured with appropriate credentials
- Python 3.10+ and Node.js LTS (v24)
- AWS CDK CLI (
npm install -g aws-cdk) - Docker or Finch (for building container images)
- A clean Python virtual environment or pipx — GCO pins exact versions of many packages, so installing it into an existing environment will commonly fail with dependency-resolver errors. If you hit
ResolutionImpossible, switch to the dev container instead of debugging your local env.
.
├── app.py # CDK app entry point
├── cdk.json # CDK configuration (regions, features, thresholds)
├── pyproject.toml # Project metadata, dependencies, and CLI installation
│
├── cli/ # GCO CLI (jobs, stacks, capacity, inference, costs, DAGs)
├── diagrams/ # Auto-generated architecture diagrams (infra_diagrams/) and code flowcharts (code_diagrams/)
├── docs/ # Documentation (architecture, CLI, API, inference, customization, analytics)
├── examples/ # Example manifests (jobs, inference, Ray, Volcano, Kueue, Slurm, YuniKorn)
├── gco/
│ ├── config/ # Configuration loader with validation
│ ├── models/ # Data models for k8s clusters, health monitor, inference monitor and manifest processor
│ ├── services/ # K8s services (health monitor, inference monitor, manifest processor, queue processor)
│ └── stacks/ # CDK stacks (global, regional, API gateway, monitoring)
│ └── constants.py # Pinned versions: EKS addons, Lambda runtime, Aurora engine
│
├── lambda/ # Lambda functions
│ ├── alb-header-validator/ # ALB header validation for auth tokens
│ ├── api-gateway-proxy/ # API Gateway → Global Accelerator proxy
│ ├── cross-region-aggregator/ # Cross-region job/health aggregation
│ ├── ga-registration/ # Global Accelerator endpoint registration
│ ├── helm-installer/ # Installs Helm charts (schedulers, GPU operators, cert-manager)
│ │ └── charts.yaml # Helm chart configuration (schedulers, GPU operators, cert-manager)
│ ├── kubectl-applier-simple/ # Applies K8s manifests during deployment
│ │ └── manifests/ # Kubernetes manifests (nodepools, RBAC, services, storage)
│ ├── proxy-shared/ # Shared utilities for proxy Lambdas
│ ├── regional-api-proxy/ # Regional API Gateway → internal ALB proxy
│ └── secret-rotation/ # Daily secret rotation
│
├── mcp/ # MCP server for LLM interaction (44 tools wrapping the CLI)
├── scripts/ # Utility scripts (version bump, cluster access setup)
└── tests/ # PyTest + BATS test suites (counts tracked via badges)
See CONTRIBUTING.md for development setup, testing, the GitHub Actions CI/CD layout, release process, and dependency scanning schedules.
Quick start for contributors (dev container — recommended):
docker build -f Dockerfile.dev -t gco-dev .
docker run --rm -v $(pwd):/workspace -w /workspace gco-dev pytest tests/ -v --cov=gco --cov=cliOr, in a clean virtual environment on your host:
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v --cov=gco --cov=cliIf
pip install -e ".[dev]"fails with dependency-resolver errors, that's the pinned-versions issue mentioned in Prerequisites. Use the dev container instead — it ships everything at the exact versions CI uses.
See the LICENSE file for details.
- Check Troubleshooting for common issues
- Review CloudWatch logs for Lambda and EKS errors
- Open an issue on GitHub
For security issues, do not open a public GitHub issue. See SECURITY.md for the disclosure process.



