Global Capacity Orchestrator (GCO)

One API. Every Accelerator. Any Region.

Multi-region compute orchestration for AWS — NVIDIA GPUs, Trainium, Inferentia, and CPU (amd64 + arm64 / Graviton) — with capacity-aware scheduling, spot fallback, and multi-region autoscaling inference endpoints with automatic failover and latency-aware routing, all from a single REST API and CLI.

🎬 Live demo recording

gco CLI demo: capacity discovery, cost visibility, 5 schedulers (Volcano, Kueue, YuniKorn, Slurm, KEDA), FSx, Valkey, live LLM inference, and EFS — all against one already-deployed cluster. (source · re-record)

📦 Deploy recording

Fresh gco stacks deploy-all -y from a clean account (re-record)

🗑️ Destroy recording

Full teardown with gco stacks destroy-all -y (re-record)

Deploy everything and tear it all down with one command each:

gco stacks deploy-all -y      # stand up every region defined in cdk.json
gco stacks destroy-all -y     # destroy every stack across every region — no orphaned resources

What it does. Spins up EKS Auto Mode clusters across AWS regions, wired together with Global Accelerator for latency-aware anycast routing and automatic failover. Submit Kubernetes manifests via a single REST API or CLI — GCO handles capacity-aware scheduling, spot fallback, multi-region autoscaling inference endpoints, and output persistence.

Who it's for. Teams running accelerated workloads — LLM training and inference, batch ML, HPC, and general CPU jobs — that need multi-region redundancy, automatic capacity discovery, and IAM-based access without per-cluster kubeconfig distribution. Pre-wired nodepools for NVIDIA GPUs (g4dn, g5, and ARM64 g5g), AWS Trainium, AWS Inferentia, and general-purpose CPU on both amd64 and arm64 / Graviton.

Why it's different. Capacity-aware routing across regions out of the box, full-stack observability (CloudWatch dashboards, alarms, SNS), and a CDK app validated across 20+ config matrix combinations in CI.

Recommended: run everything from the dev container. GCO pins exact versions of a lot of Python packages (CDK, AWS SDKs, FastAPI, mypy, Ruff, etc.), and installing them on top of an existing Python environment is the most common source of "it doesn't install" reports. The dev container ships a fully resolved environment (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI, all Python deps) so you skip the whole problem.

git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev

The docker.sock mount lets gco stacks deploy-all bundle Lambda assets through your host Docker daemon. See Prerequisites for Colima/Finch socket paths and the security note about host-socket pass-through.

Prefer to install on your host? (advanced)

This path requires Python 3.10+ and works best in a fresh virtual environment. With a lot of pinned dependencies, mixing GCO into an existing environment will frequently produce resolver conflicts — use a clean venv or pipx.

git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws && pipx install -e .

See the Quick Start for the full install + first-job walkthrough, or docs/CLI.md for every CLI command.

💡 New to the codebase? GCO ships with an MCP server exposing 44 tools that index the whole project — docs, examples, source code, K8s manifests, scripts. Connect it to an AI-powered IDE (like Kiro) and ask in natural language: "How does region recommendation work?", "Walk me through the inference deployment flow". See mcp/README.md.

Table of contents

Why GCO?
Quick Start
Architecture Overview
Key Features
Documentation
Project Structure
Contributing
Support

Why GCO?

Running GPU workloads at scale is hard. You need to find regions with available capacity, provision clusters, handle authentication, deal with failover, and persist outputs after pods terminate. GCO solves all of this with a single deployable platform.

Challenge	Traditional Approach	With GCO
GPU availability	Manually check each region	Auto-routes to available capacity
Node provisioning	Pre-provision or wait for scaling	EKS Auto Mode provisions on-demand
Multi-region ops	Manage clusters separately	Single API, automatic routing
Authentication	Configure per-cluster access	IAM-based, uses existing AWS credentials
Job outputs	Lost when pods terminate	Persisted to EFS/FSx storage
Inference serving	Deploy and manage per-region	Deploy once, serve globally
Failover	Manual intervention required	Automatic via Global Accelerator

When to use GCO:

You need to run GPU workloads (training, inference, batch processing)
You want to deploy inference endpoints across multiple regions with a single command
You want multi-region redundancy without managing multiple clusters
You prefer IAM authentication over kubeconfig management
You need job outputs to persist after completion

Quick Start

Install and Deploy

The fastest, most reliable path is the dev container — it sidesteps the dependency-conflict issues that come with installing GCO's pinned Python packages on top of your existing Python environment.

# Build the dev container (Python, Node.js, CDK, kubectl, AWS CLI all pinned & pre-installed)
docker build -f Dockerfile.dev -t gco-dev .

# Drop into a shell with the gco CLI already installed
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev

# From inside the container — deploy everything (CDK bootstrap runs automatically)
gco stacks deploy-all -y

If you'd rather install on your host, use a clean virtual environment or pipx — see the Prerequisites and QUICKSTART.md for the details and known caveats.

Optional: configure kubectl access (requires PUBLIC_AND_PRIVATE endpoint mode). The default endpoint mode is PRIVATE — see docs/CUSTOMIZATION.md for details. Most users don't need this; submit jobs via SQS or API Gateway instead.

Submit Your First Job

# Check GPU capacity
gco capacity check --instance-type g4dn.xlarge --region us-east-1

# Submit a job (pick your preferred method)
gco jobs submit-sqs examples/simple-job.yaml --region us-east-1    # via SQS (recommended)
gco queue submit examples/simple-job.yaml --region us-east-1       # via Global DynamoDB queue
gco jobs submit examples/simple-job.yaml -n gco-jobs               # via API Gateway
gco jobs submit-direct examples/simple-job.yaml -r us-east-1       # via kubectl

# Check status and get logs
gco jobs list --all-regions
gco jobs logs hello-gco -n gco-jobs -r us-east-1

Deploy an Inference Endpoint

gco inference deploy my-llm -i vllm/vllm-openai:v0.20.1 --gpu-count 1
gco inference status my-llm
gco inference scale my-llm --replicas 3

See the Quick Start Guide for the full step-by-step walkthrough, or the CLI Reference for all available commands.

Architecture Overview

📊 Full Architecture Diagram (click to expand)

Regenerate this diagram and every per-stack view on demand with python diagrams/infra_diagrams/generate.py — it synthesises the current CDK app through AWS PDK cdk-graph so the diagrams never drift from the source. See diagrams/infra_diagrams/README.md for per-stack flags (--stack global|api-gateway|regional|regional-api|monitoring|analytics|all). Flowcharts of the code itself (Lambda handlers, CLI commands) live alongside them under diagrams/code_diagrams/.

The regional stack can be deployed to any AWS region. Add or remove regions by editing the deployment_regions.regional array in cdk.json.

┌───────────────────────────────────────────────────┐
│              User Request                         │
│        (AWS SigV4 Authentication)                 │
└────────────────────┬──────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────┐
│      API Gateway (Edge-Optimized, Global)         │
│      ✓ IAM Authentication Required                │
│      ✓ CloudFront Edge Caching                    │
└────────────────────┬──────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────────┐
│              AWS Global Accelerator               │
│         Routes to nearest healthy region          │
└────────────────────┬──────────────────────────────┘
                     │
        ┌────────────┼────────────┬────────────┐
        │            │            │            │
   ┌────▼────┐  ┌────▼────┐  ┌────▼────┐  ┌────▼────┐
   │us-east-1│  │us-west-2│  │eu-west-1│  │  More   │
   │   ALB   │  │   ALB   │  │   ALB   │  │ Regions │
   │(GA IPs  │  │(GA IPs  │  │(GA IPs  │  |(GA IPs  │
   │  only)  │  │  only)  │  │  only)  │  |  only)  │
   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘
        │            │            │            │
   ┌────▼────────────▼────────────▼────────────▼────┐
   │    EKS Auto Mode Cluster (per region)          │
   │  ┌─────────────────────────────────────────┐   │
   │  │  Nodepools: System, General, GPU (x86   │   │
   │  │  + ARM), Inference                      │   │
   │  ├─────────────────────────────────────────┤   │
   │  │  Services: Health Monitor, Manifest     │   │
   │  │  Processor, Inference Monitor           │   │
   │  ├─────────────────────────────────────────┤   │
   │  │  Storage: EFS (shared) + FSx (optional) │   │
   │  └─────────────────────────────────────────┘   │
   └────────────────────────────────────────────────┘

Security Model

Five layers protect every request:

IAM Authentication — API Gateway validates AWS credentials (SigV4)
Secret Header — Lambda injects a rotating token from Secrets Manager
IP Restriction — ALBs only accept Global Accelerator IPs
Header Validation — Backend services verify the secret token
IRSA — Pods use IAM roles for AWS access (no static credentials)

Request flow: User → API Gateway (SigV4) → Lambda (adds secret) → Global Accelerator
  → ALB (GA IPs only) → Services (validate secret)

For private clusters, Regional API Gateways provide direct VPC access without public ALB exposure.

See Architecture Details for the full deep dive.

Key Features

Compute & Orchestration

EKS Auto Mode with automatic node provisioning — no pre-scaling needed
GPU support for x86_64 (g4dn, g5) and ARM64 (g5g) via Karpenter nodepools
Multiple submission methods: API Gateway, SQS queues, DynamoDB job queue, or direct kubectl
Job pipelines (DAGs): Multi-step ML pipelines with dependency ordering and failure handling
Helm-managed ecosystem: KEDA, Volcano, KubeRay, Kueue, GPU Operator, DRA, and more — configurable via cdk.json

Inference Serving

Multi-region inference: Deploy endpoints (vLLM, TGI, Triton, TorchServe, SGLang) across regions with a single command
Canary deployments: A/B test new model versions with weighted traffic routing
Model weight management: Central S3 bucket with KMS encryption, automatic sync to each region
Spot instance support: Run inference on spot GPUs for significant cost savings
Autoscaling: HPA-based scaling with CPU/memory metrics

Networking & Security

Global Accelerator: Single anycast endpoint with automatic failover
IAM authentication: SigV4 at the API Gateway — no kubeconfig distribution
Compliance validated: CDK-nag checks for AWS Solutions, HIPAA, NIST 800-53, PCI DSS
Network policies: Default-deny with explicit allow rules for all service communication
EFA support: Optional Elastic Fabric Adapter for high-bandwidth distributed training and NIXL-based inference (toggle on/off)

Storage & Data

EFS: Shared elastic storage for job outputs that persist after pod termination
FSx for Lustre: Optional high-performance parallel file system for ML training (toggle on/off)
Valkey cache: Optional serverless key-value cache for prompt caching and session state
Aurora pgvector: Optional serverless vector database for RAG, semantic search, and embedding storage

Operations

Cost visibility: Track spend by service, region, and workload via Cost Explorer integration
Auto-bootstrap: CDK bootstrap runs automatically for new regions during deploy
Multi-region monitoring: CloudWatch dashboards, alarms, and SNS alerts across all regions

ML & Analytics Environment

ML & Analytics Environment: Optional SageMaker Studio domain + EMR Serverless + Cognito user pool for interactive notebook analytics, with an always-on Cluster_Shared_Bucket that all cluster jobs can read and write. Off by default — enable with gco analytics enable. See Analytics Guide.

Documentation

New to GCO? Start here:

Your Goal	Read This
Understand what GCO does	Core Concepts
Get running in under 60 minutes	Quick Start Guide
Learn the architecture	Architecture Details

Day-to-day operations:

Your Goal	Read This
CLI commands and usage	CLI Reference
Deploy inference endpoints	Inference Guide
Use the REST API directly	API Reference
Fix issues	Troubleshooting
Respond to incidents	Operational Runbooks
Run interactive notebook analytics	Analytics Guide

Customization and development:

Your Goal	Read This
Add regions, tune nodepools, enable FSx	Customization Guide
Choose a scheduler for your workload	Schedulers & Orchestrators
Configure the SQS queue processor	Queue Processor Config
Contribute to the project	Contributing
API client examples (Python, curl, AWS CLI)	Client Examples
IAM policy templates	IAM Policies
Presentation slides and demo scripts	Demo Starter Kit

Prerequisites

Recommended path — dev container only:

AWS CLI configured with appropriate credentials (or ~/.aws to mount in)
Docker (or Finch / Colima) — that's it. The container ships Python 3.14, Node.js 24, CDK, kubectl, and AWS CLI at pinned versions.

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm -v ~/.aws:/root/.aws:ro -v $(pwd):/workspace -w /workspace gco-dev

For gco stacks deploy-all, cdk deploy needs to run Docker to bundle Lambda assets. Mount the host Docker socket so the container's CLI talks to your host daemon (works with Docker Desktop on macOS/Windows, with Docker on Linux, and with Colima on macOS — see Dockerfile.dev for Colima-specific socket paths):

docker run --rm -it \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev gco stacks deploy-all -y

This is host-socket pass-through, not true Docker-in-Docker. Anyone with access to the container has root-equivalent access to the host Docker daemon, so keep the container on a trusted host.

Host install path (advanced):

AWS CLI configured with appropriate credentials
Python 3.10+ and Node.js LTS (v24)
AWS CDK CLI (npm install -g aws-cdk)
Docker or Finch (for building container images)
A clean Python virtual environment or pipx — GCO pins exact versions of many packages, so installing it into an existing environment will commonly fail with dependency-resolver errors. If you hit ResolutionImpossible, switch to the dev container instead of debugging your local env.

Project Structure

.
├── app.py                               # CDK app entry point
├── cdk.json                             # CDK configuration (regions, features, thresholds)
├── pyproject.toml                       # Project metadata, dependencies, and CLI installation
│
├── cli/                                 # GCO CLI (jobs, stacks, capacity, inference, costs, DAGs)
├── diagrams/                            # Auto-generated architecture diagrams (infra_diagrams/) and code flowcharts (code_diagrams/)
├── docs/                                # Documentation (architecture, CLI, API, inference, customization, analytics)
├── examples/                            # Example manifests (jobs, inference, Ray, Volcano, Kueue, Slurm, YuniKorn)
├── gco/
│   ├── config/                          # Configuration loader with validation
│   ├── models/                          # Data models for k8s clusters, health monitor, inference monitor and manifest processor
│   ├── services/                        # K8s services (health monitor, inference monitor, manifest processor, queue processor)
│   └── stacks/                          # CDK stacks (global, regional, API gateway, monitoring)
│       └── constants.py                 # Pinned versions: EKS addons, Lambda runtime, Aurora engine
│
├── lambda/                              # Lambda functions
│   ├── alb-header-validator/            # ALB header validation for auth tokens
│   ├── api-gateway-proxy/               # API Gateway → Global Accelerator proxy
│   ├── cross-region-aggregator/         # Cross-region job/health aggregation
│   ├── ga-registration/                 # Global Accelerator endpoint registration
│   ├── helm-installer/                  # Installs Helm charts (schedulers, GPU operators, cert-manager)
│   │   └── charts.yaml                  # Helm chart configuration (schedulers, GPU operators, cert-manager)
│   ├── kubectl-applier-simple/          # Applies K8s manifests during deployment
│   │   └── manifests/                   # Kubernetes manifests (nodepools, RBAC, services, storage)
│   ├── proxy-shared/                    # Shared utilities for proxy Lambdas
│   ├── regional-api-proxy/              # Regional API Gateway → internal ALB proxy
│   └── secret-rotation/                 # Daily secret rotation
│
├── mcp/                                 # MCP server for LLM interaction (44 tools wrapping the CLI)
├── scripts/                             # Utility scripts (version bump, cluster access setup)
└── tests/                               # PyTest + BATS test suites (counts tracked via badges)

Contributing

See CONTRIBUTING.md for development setup, testing, the GitHub Actions CI/CD layout, release process, and dependency scanning schedules.

Quick start for contributors (dev container — recommended):

docker build -f Dockerfile.dev -t gco-dev .
docker run --rm -v $(pwd):/workspace -w /workspace gco-dev pytest tests/ -v --cov=gco --cov=cli

Or, in a clean virtual environment on your host:

python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v --cov=gco --cov=cli

If pip install -e ".[dev]" fails with dependency-resolver errors, that's the pinned-versions issue mentioned in Prerequisites. Use the dev container instead — it ships everything at the exact versions CI uses.

License

See the LICENSE file for details.

Support

Check Troubleshooting for common issues
Review CloudWatch logs for Lambda and EKS errors
Open an issue on GitHub

Security

For security issues, do not open a public GitHub issue. See SECURITY.md for the disclosure process.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github		.github
cli		cli
demo		demo
diagrams		diagrams
dockerfiles		dockerfiles
docs		docs
examples		examples
gco		gco
images		images
lambda		lambda
mcp		mcp
scripts		scripts
tests		tests
.checkov.yaml		.checkov.yaml
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitleaks.toml		.gitleaks.toml
.kics.yaml		.kics.yaml
.markdownlint-cli2.yaml		.markdownlint-cli2.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.semgrepignore		.semgrepignore
.trivyignore		.trivyignore
.yamllint.yml		.yamllint.yml
ACKNOWLEDGMENTS.md		ACKNOWLEDGMENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
VERSION		VERSION
app.py		app.py
cdk.json		cdk.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-lock.txt		requirements-lock.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global Capacity Orchestrator (GCO)

Why GCO?

Quick Start

Install and Deploy

Submit Your First Job

Deploy an Inference Endpoint

Architecture Overview

Security Model

Key Features

Compute & Orchestration

Inference Serving

Networking & Security

Storage & Data

Operations

ML & Analytics Environment

Documentation

Prerequisites

Project Structure

Contributing

License

Support

Security

About

Uh oh!

Releases 40

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Global Capacity Orchestrator (GCO)

Why GCO?

Quick Start

Install and Deploy

Submit Your First Job

Deploy an Inference Endpoint

Architecture Overview

Security Model

Key Features

Compute & Orchestration

Inference Serving

Networking & Security

Storage & Data

Operations

ML & Analytics Environment

Documentation

Prerequisites

Project Structure

Contributing

License

Support

Security

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 40

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages