Complete command-line interface documentation for GCO (Global Capacity Orchestrator on AWS).
# Install pipx if not already installed
brew install pipx && pipx ensurepath # macOS
# or
pip install pipx && pipx ensurepath # Linux/Windows
# Install GCO CLI
pipx install -e .pip install -e .gco --version
gco --helpThese options are available for all commands:
| Option | Short | Description |
|---|---|---|
--config |
-c |
Path to config file |
--region |
-r |
Default AWS region |
--output |
-o |
Output format: table, json, yaml |
--verbose |
-v |
Enable verbose output |
--regional-api |
Use regional API endpoints (for private access) | |
--help |
Show help message | |
--version |
Show version |
When --regional-api is enabled (or GCO_REGIONAL_API=true environment variable is set), the CLI routes requests through regional API Gateways instead of the global API Gateway. This is required when:
- The ALB is internal-only (no public exposure)
- Public access is disabled on the EKS cluster
- Maximum security posture is required
# Use regional API for a single command
gco --regional-api jobs list --region us-east-1
# Or set environment variable for all commands
export GCO_REGIONAL_API=true
gco jobs list --region us-east-1Manage jobs across GCO clusters.
Submit a job via API Gateway (SigV4 authenticated).
gco jobs submit MANIFEST_PATH [OPTIONS]Arguments:
MANIFEST_PATH- Path to YAML manifest file
Options:
| Option | Short | Description |
|---|---|---|
--namespace |
-n |
Fallback namespace for manifests that don't declare their own (manifest metadata.namespace takes precedence) |
--region |
-r |
Target specific region |
--dry-run |
Validate without applying | |
--label |
-l |
Add labels (key=value), can be repeated |
--wait |
-w |
Wait for job completion |
--timeout |
Wait timeout in seconds (default: 3600) |
Example:
gco jobs submit examples/simple-job.yaml -n gco-jobs
gco jobs submit job.yaml --dry-run
gco jobs submit job.yaml -l team=ml -l priority=highSubmit a job via SQS queue (recommended for production).
gco jobs submit-sqs MANIFEST_PATH [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region for SQS queue |
--auto-region |
Auto-select optimal region based on capacity | |
--priority |
-p |
Job priority (0-100, higher = more important) |
--namespace |
-n |
Fallback namespace for manifests that don't declare their own (manifest metadata.namespace takes precedence) |
Example:
gco jobs submit-sqs examples/simple-job.yaml --region us-east-1
gco jobs submit-sqs job.yaml --auto-region --priority 10Submit a job directly via kubectl (requires EKS access).
If a job with the same name already exists:
- Completed or failed jobs are silently deleted and replaced
- Active (running/pending) jobs are preserved, and the new submission is auto-renamed with a
-{5char}suffix
gco jobs submit-direct MANIFEST_PATH [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region |
--namespace |
-n |
Fallback namespace for manifests that don't declare their own (manifest metadata.namespace takes precedence) |
Example:
gco jobs submit-direct examples/simple-job.yaml --region us-east-1 -n gco-jobsSubmit a job to the global DynamoDB queue for regional pickup.
gco jobs submit-queue MANIFEST_PATH [OPTIONS]Jobs are stored in DynamoDB and picked up by the target region's manifest processor. This enables global job submission with centralized tracking and status history.
Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region for job execution (required) |
--namespace |
-n |
Kubernetes namespace |
--priority |
-p |
Job priority (0-100, higher = more important) |
--label |
-l |
Add labels (key=value), can be repeated |
Example:
gco jobs submit-queue examples/simple-job.yaml --region us-east-1
gco jobs submit-queue job.yaml -r us-west-2 --priority 50
gco jobs submit-queue job.yaml -r us-east-1 -l team=ml -l project=trainingNote: Use gco queue list or gco queue get <job_id> to track job status.
List jobs in GCO clusters.
gco jobs list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region (required unless --all-regions) |
--all-regions |
-a |
Query all regions via global API |
--namespace |
-n |
Filter by namespace |
--status |
-s |
Filter by status |
--limit |
-l |
Maximum results (default: 50) |
Example:
gco jobs list --region us-east-1
gco jobs list --all-regions
gco jobs list -r us-west-2 -n gco-jobs --status runningGet details of a specific job.
gco jobs get JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
Example:
gco jobs get my-job --region us-east-1
gco jobs get training-job -r us-west-2 -n ml-jobsGet logs from a job.
gco jobs logs JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
--tail |
-t |
Number of lines to show |
--container |
-c |
Container name (for multi-container pods) |
Example:
gco jobs logs my-job --region us-east-1
gco jobs logs my-job -r us-east-1 --tail 500
gco jobs logs multi-container-job -r us-east-1 --container sidecarDelete a job.
gco jobs delete JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
--yes |
-y |
Skip confirmation |
Example:
gco jobs delete my-job --region us-east-1
gco jobs delete old-job -r us-west-2 -n ml-jobs -yGet Kubernetes events for a job.
gco jobs events JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
Example:
gco jobs events my-job --region us-east-1
gco jobs events training-job -r us-west-2 -n ml-jobsGet pod details for a job.
gco jobs pods JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
Example:
gco jobs pods my-job --region us-east-1
gco jobs pods training-job -r us-west-2 -n ml-jobsGet resource usage metrics for a job.
gco jobs metrics JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
Example:
gco jobs metrics my-job --region us-east-1
gco jobs metrics training-job -r us-west-2 -n ml-jobsRetry a failed job.
gco jobs retry JOB_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Job region (required) |
--namespace |
-n |
Job namespace |
--yes |
-y |
Skip confirmation |
Example:
gco jobs retry failed-job --region us-east-1
gco jobs retry training-job -r us-west-2 -n ml-jobs -yBulk delete jobs based on filters.
gco jobs bulk-delete [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region (required unless --all-regions) |
--all-regions |
-a |
Delete across all regions |
--namespace |
-n |
Filter by namespace |
--status |
-s |
Filter by status |
--older-than-days |
-d |
Delete jobs older than N days |
--label-selector |
-l |
Kubernetes label selector |
--dry-run |
Only show what would be deleted (default) | |
--execute |
Actually delete (disables dry-run) | |
--yes |
-y |
Skip confirmation |
Example:
gco jobs bulk-delete --region us-east-1 --status completed --older-than-days 7
gco jobs bulk-delete -r us-west-2 -n gco-jobs -s failed --execute -y
gco jobs bulk-delete --all-regions --status failed --older-than-days 30 --executeGet health status of GCO clusters.
gco jobs health [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region (required unless --all-regions) |
--all-regions |
-a |
Get health across all regions |
Example:
gco jobs health --region us-east-1
gco jobs health --all-regionsView SQS queue status across regions.
gco jobs queue-status [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Filter by region |
--all-regions |
Show all regions |
Example:
gco jobs queue-status --all-regions
gco jobs queue-status -r us-east-1Manage the global job queue (DynamoDB-backed). The job queue provides centralized job submission and tracking across all regions.
Submit a job to the global queue for regional pickup.
gco queue submit MANIFEST_PATH [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region for job execution (required) |
--namespace |
-n |
Kubernetes namespace |
--priority |
-p |
Job priority (0-100, higher = more important) |
--label |
-l |
Add labels (key=value), can be repeated |
Example:
gco queue submit job.yaml --region us-east-1
gco queue submit job.yaml -r us-west-2 --priority 50
gco queue submit job.yaml -r us-east-1 -l team=ml -l project=trainingList jobs in the global queue.
gco queue list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Filter by target region |
--status |
-s |
Filter by status (queued, claimed, running, succeeded, failed, cancelled) |
--namespace |
-n |
Filter by namespace |
--limit |
-l |
Maximum results (default: 50) |
Example:
gco queue list
gco queue list --region us-east-1 --status queued
gco queue list -s runningGet details of a queued job including status history.
gco queue get JOB_ID [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to query (any region works) |
Example:
gco queue get abc123-def456
gco queue get abc123-def456 --region us-east-1Cancel a queued job (only works for jobs not yet running).
gco queue cancel JOB_ID [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--reason |
Cancellation reason | |
--region |
-r |
Region to query (any region works) |
--yes |
-y |
Skip confirmation |
Example:
gco queue cancel abc123-def456
gco queue cancel abc123-def456 --reason "No longer needed" -yGet job queue statistics by region and status.
gco queue stats [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to query (any region works) |
Example:
gco queue statsManage job templates. Templates are reusable job configurations stored in DynamoDB with parameter substitution support.
List all job templates.
gco templates list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to query |
Example:
gco templates listGet details of a specific template.
gco templates get TEMPLATE_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to query |
Example:
gco templates get gpu-training-templateCreate a new job template from a manifest file.
gco templates create MANIFEST_PATH [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--name |
-n |
Template name (required) |
--description |
-d |
Template description |
--param |
-p |
Default parameter (key=value), can be repeated |
--region |
-r |
Region to create in |
Example:
gco templates create job.yaml --name gpu-training -d "GPU training template"
gco templates create job.yaml -n my-template -p image=pytorch:latest -p gpus=4Delete a job template.
gco templates delete TEMPLATE_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region |
--yes |
-y |
Skip confirmation |
Example:
gco templates delete old-template -yCreate and run a job from a template.
gco templates run TEMPLATE_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--name |
-n |
Job name (required) |
--region |
-r |
Target region (required) |
--namespace |
Kubernetes namespace | |
--param |
-p |
Parameter override (key=value), can be repeated |
Example:
gco templates run gpu-training --name my-job --region us-east-1
gco templates run gpu-template -n my-job -r us-east-1 -p image=custom:v1 -p gpus=8Manage webhooks for job event notifications. Webhooks receive HTTP POST notifications when job events occur.
List all registered webhooks.
gco webhooks list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--namespace |
-n |
Filter by namespace |
--region |
-r |
Region to query |
Example:
gco webhooks list
gco webhooks list --namespace gco-jobsRegister a new webhook for job events.
gco webhooks create [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--url |
-u |
Webhook URL (required) |
--event |
-e |
Event type (job.started, job.completed, job.failed), can be repeated |
--namespace |
-n |
Filter events by namespace |
--secret |
-s |
HMAC secret for signature verification |
--region |
-r |
Region to create in |
Example:
gco webhooks create --url https://example.com/webhook -e job.completed -e job.failed
gco webhooks create -u https://slack.com/webhook -e job.failed -n gco-jobsDelete a webhook.
gco webhooks delete WEBHOOK_ID [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region |
--yes |
-y |
Skip confirmation |
Example:
gco webhooks delete abc12345 -yManage CDK infrastructure stacks.
List all GCO stacks.
gco stacks list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Filter by region |
--all-regions |
List from all regions |
Get detailed status of a stack.
gco stacks status STACK_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Stack region |
Example:
gco stacks status gco-us-east-1 --region us-east-1Deploy a single stack. Automatically bootstraps CDK in the target region if needed.
gco stacks deploy STACK_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Stack region |
--yes |
-y |
Skip confirmation |
Example:
gco stacks deploy gco-us-east-1 -yDeploy all stacks in correct order. Automatically bootstraps CDK in any un-bootstrapped regions before deploying.
gco stacks deploy-all [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
--parallel |
-p |
Deploy regional stacks in parallel |
--max-workers |
-w |
Max parallel workers (default: 4) |
Example:
gco stacks deploy-all -y
gco stacks deploy-all -y --parallel --max-workers 8Destroy a single stack.
gco stacks destroy STACK_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Stack region |
--yes |
-y |
Skip confirmation |
Destroy all stacks in correct order.
gco stacks destroy-all [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
--parallel |
-p |
Destroy regional stacks in parallel |
--max-workers |
-w |
Max parallel workers (default: 4) |
Bootstrap CDK in a region. This is run automatically by deploy and deploy-all when needed, so manual bootstrapping is optional.
gco stacks bootstrap [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to bootstrap |
Configure kubectl access to a GCO EKS cluster. Updates kubeconfig, creates an EKS access entry for your IAM principal, and associates the cluster admin policy. Handles assumed roles automatically.
gco stacks access [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--cluster |
-c |
Cluster name (default: gco-{region}) |
--region |
-r |
AWS region (default: first deployment region) |
Examples:
gco stacks access # Auto-detect region from cdk.json
gco stacks access -r us-west-2 # Specific region
gco stacks access -c my-cluster -r eu-west-1 # Custom cluster nameManage FSx for Lustre storage.
gco stacks fsx COMMAND [OPTIONS]Subcommands:
status- Show FSx statusenable- Enable FSx for Lustredisable- Disable FSx for Lustre
Example:
gco stacks fsx status
gco stacks fsx enable --storage-capacity 1200 -y
gco stacks fsx disable -yManage Valkey Serverless cache.
gco stacks valkey COMMAND [OPTIONS]Subcommands:
status- Show Valkey configuration statusenable- Enable Valkey Serverless cachedisable- Disable Valkey Serverless cache
Example:
gco stacks valkey status
gco stacks valkey enable --max-storage 10 --max-ecpu 10000 -y
gco stacks valkey disable -yManage Aurora PostgreSQL (pgvector) database.
gco stacks aurora COMMAND [OPTIONS]Subcommands:
status- Show Aurora pgvector configuration statusenable- Enable Aurora Serverless v2 with pgvectordisable- Disable Aurora pgvector
Example:
gco stacks aurora status
gco stacks aurora enable --min-acu 2 --max-acu 32 --deletion-protection -y
gco stacks aurora disable -yRun multi-step job pipelines with dependencies. Define a DAG in YAML, and GCO runs steps in dependency order, skipping downstream steps if a dependency fails.
Execute a DAG pipeline.
gco dag run DAG_FILE [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to run in (default: from DAG file or first deployed) |
--timeout |
-t |
Timeout per step in seconds (default: 3600) |
--dry-run |
Validate and show execution order without running |
Examples:
# Run a pipeline
gco dag run pipeline.yaml -r us-east-1
# Preview execution order
gco dag run pipeline.yaml --dry-runValidate a DAG definition without running it. Checks for cycles, missing dependencies, and missing manifest files.
gco dag validate DAG_FILEExample:
gco dag validate examples/pipeline-dag.yamlname: my-pipeline
region: us-east-1 # optional, auto-detects if omitted
namespace: gco-jobs # optional, defaults to gco-jobs
steps:
- name: preprocess
manifest: examples/preprocess-job.yaml
- name: train
manifest: examples/train-job.yaml
depends_on: [preprocess]
- name: evaluate
manifest: examples/evaluate-job.yaml
depends_on: [train]Steps without depends_on run first. Steps with dependencies wait until all dependencies succeed. If a step fails, all downstream steps are automatically skipped.
Use shared EFS storage (/mnt/shared) to pass data between steps.
View cost breakdowns and estimates for GCO resources. Uses AWS Cost Explorer filtered by the Project: GCO tag applied to all resources.
Setup (one-time): To filter costs by the Project tag, you must activate cost allocation tags in your AWS account:
- Go to the AWS Billing Console → Cost Allocation Tags
- Search for the
Projecttag under "User-defined cost allocation tags" - Select it and click "Activate"
- Wait ~24 hours for tag data to appear in Cost Explorer
Until the tag is activated, use --all to see total account costs:
gco costs summary --allYou can also activate the Environment and Owner tags for more granular filtering in the AWS Cost Explorer console.
Show total GCO spend broken down by AWS service.
gco costs summary [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--days |
-d |
Number of days to look back (default: 30) |
--all |
Show all account costs, not filtered by GCO tag |
Examples:
# Last 30 days (default)
gco costs summary
# Last 7 days
gco costs summary --days 7
# All account costs (before tags are activated)
gco costs summary --all
# JSON output
gco --output json costs summaryShow cost breakdown by AWS region.
gco costs regions [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--days |
-d |
Number of days to look back (default: 30) |
Examples:
gco costs regions
gco costs regions --days 7Show daily cost trend with a visual bar chart.
gco costs trend [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--days |
-d |
Number of days to show (default: 14) |
--all |
Show all account costs, not filtered by GCO tag |
Examples:
gco costs trend
gco costs trend --days 7
gco costs trend --allEstimate costs for currently running workloads (jobs and inference endpoints) based on instance pricing and runtime.
gco costs workloads [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region to check (default: all deployment regions) |
Examples:
# All regions
gco costs workloads
# Specific region
gco costs workloads -r us-east-1Forecast GCO costs for the next N days based on historical spending patterns.
gco costs forecast [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--days |
-d |
Days to forecast ahead (default: 30) |
Examples:
gco costs forecast
gco costs forecast --days 60Note: Cost Explorer needs at least 14 days of historical data to generate forecasts.
Check and manage cluster capacity.
Check capacity for a specific instance type.
gco capacity check [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--instance-type |
-i |
Instance type to check |
--region |
-r |
Region to check |
--type |
-t |
Capacity type: spot, on-demand, or both |
Example:
gco capacity check --instance-type g4dn.xlarge --region us-east-1
gco capacity check -i g5.xlarge -r us-west-2 -t spotView capacity status across regions.
gco capacity status [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Filter by region |
Get capacity recommendation for an instance type.
gco capacity recommend [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--instance-type |
-i |
Instance type |
--region |
-r |
Region |
Get optimal region recommendation.
gco capacity recommend-region [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--gpu |
Recommend for GPU workloads | |
--instance-type |
-i |
Specific instance type (enables weighted scoring) |
--gpu-count |
Number of GPUs required | |
--min-gpus |
Minimum GPUs required |
When --instance-type is provided, the recommendation uses weighted multi-signal
scoring that combines spot placement scores, spot-vs-on-demand pricing, queue depth,
GPU utilization, and running job counts. Without it, a simpler composite score is used.
Example:
gco capacity recommend-region --gpu
gco capacity recommend-region -i g5.xlarge
gco capacity recommend-region -i p4d.24xlarge --gpu-count 8Get AI-powered capacity recommendation using Amazon Bedrock.
gco capacity ai-recommend [OPTIONS]This command gathers comprehensive capacity data including:
- Spot placement scores and pricing across regions
- On-demand availability and pricing
- Current cluster utilization (queue depth, GPU/CPU usage)
- Running and pending job counts
The data is analyzed by an LLM (Claude by default) to provide intelligent recommendations.
Requirements:
- AWS credentials with
bedrock:InvokeModelpermission - The specified Bedrock model must be enabled in your account
Options:
| Option | Short | Description |
|---|---|---|
--workload |
-w |
Description of your workload |
--instance-type |
-i |
Instance types to consider (can specify multiple) |
--region |
-r |
Regions to consider (can specify multiple) |
--gpu |
Workload requires GPUs | |
--min-gpus |
Minimum GPUs required | |
--min-memory-gb |
Minimum memory in GB | |
--fault-tolerance |
-f |
Fault tolerance level: high, medium, low |
--max-cost |
Maximum cost per hour in USD | |
--model |
-m |
Bedrock model ID to use |
--raw |
Show raw AI response |
Example:
# Basic recommendation
gco capacity ai-recommend --workload "Training a large language model"
# GPU workload with specific requirements
gco capacity ai-recommend -w "Inference workload" --gpu --min-gpus 4
# Compare specific instance types and regions
gco capacity ai-recommend -i g5.xlarge -i g5.2xlarge -r us-east-1 -r us-west-2
# Cost-constrained recommendation
gco capacity ai-recommend --fault-tolerance high --max-cost 5.00
# Use a different model
gco capacity ai-recommend -w "ML training" --model us.anthropic.claude-3-haiku-20240307-v1:0List On-Demand Capacity Reservations (ODCRs) across deployed regions.
gco capacity reservations [OPTIONS]| Option | Description |
|---|---|
-i, --instance-type |
Filter by instance type |
-r, --region |
Specific region (default: all deployed regions) |
# List all active reservations
gco capacity reservations
# Filter by instance type
gco capacity reservations -i p5.48xlarge
# Check a specific region
gco capacity reservations -r us-east-1Check reservation availability and Capacity Block offerings for ML workloads. Checks both existing ODCRs and purchasable Capacity Blocks (guaranteed GPU capacity for a fixed duration at a known price).
gco capacity reservation-check [OPTIONS]| Option | Description |
|---|---|
-i, --instance-type |
Instance type to check (required) |
-r, --region |
Specific region (default: all deployed regions) |
-c, --count |
Minimum instances needed (default: 1) |
--include-blocks/--no-blocks |
Include Capacity Block offerings (default: yes) |
--block-duration |
Capacity Block duration in hours (default: 24) |
# Check for p5.48xlarge reservations and block offerings
gco capacity reservation-check -i p5.48xlarge
# Check with specific count and duration
gco capacity reservation-check -i p4d.24xlarge -c 2 --block-duration 48
# ODCRs only, no block offerings
gco capacity reservation-check -i g5.48xlarge -r us-east-1 --no-blocksManage multi-region inference endpoints. Endpoints are stored in DynamoDB and reconciled by the inference_monitor in each target region.
See Inference Guide for architecture details and workflows.
Deploy an inference endpoint to one or more regions.
gco inference deploy ENDPOINT_NAME [OPTIONS]Arguments:
ENDPOINT_NAME- Unique name for the endpoint
Options:
| Option | Short | Description |
|---|---|---|
--image |
-i |
Container image (required) |
--region |
-r |
Target region(s), repeatable (default: all deployed regions) |
--replicas |
Replicas per region (default: 1) | |
--gpu-count |
GPUs per replica (default: 1) | |
--gpu-type |
GPU instance type hint (e.g. g5.xlarge) | |
--port |
Container port (default: 8000) | |
--model-path |
EFS path for model weights | |
--model-source |
S3 URI for model weights (auto-synced via init container) | |
--health-path |
Health check endpoint path (default: /health) | |
--env |
-e |
Environment variable (KEY=VALUE), repeatable |
--namespace |
-n |
Kubernetes namespace (default: gco-inference) |
--label |
-l |
Label (key=value), repeatable |
--min-replicas |
Autoscaling: minimum replicas | |
--max-replicas |
Autoscaling: maximum replicas | |
--autoscale-metric |
Autoscaling metric (e.g. cpu:70, memory:80), repeatable. Enables HPA. |
|
--capacity-type |
Node capacity type: on-demand (default) or spot |
|
--accelerator |
nvidia |
Accelerator type: nvidia for GPU instances, neuron for Trainium/Inferentia |
--node-selector |
Node selector (key=value), repeatable. E.g. eks.amazonaws.com/instance-family=inf2 |
|
--extra-args |
Extra arguments passed to the container (e.g. --kv-transfer-config {...}). Repeatable |
Example:
gco inference deploy my-llm -i vllm/vllm-openai:v0.20.1
gco inference deploy llama3-70b \
-i vllm/vllm-openai:v0.20.1 \
-r us-east-1 -r eu-west-1 \
--replicas 2 --gpu-count 4 \
--model-source s3://bucket/models/llama3-70b \
-e MODEL=/models/llama3-70b
# Deploy with autoscaling (creates a Kubernetes HPA)
gco inference deploy my-llm \
-i vllm/vllm-openai:v0.20.1 \
--replicas 2 --gpu-count 1 \
--min-replicas 1 --max-replicas 8 \
--autoscale-metric cpu:70 --autoscale-metric memory:80List inference endpoints.
gco inference list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--state |
-s |
Filter by state (deploying, running, stopped, deleted) |
--region |
-r |
Filter by target region |
Example:
gco inference list
gco inference list --state running
gco inference list -r us-east-1Show detailed status of an inference endpoint including per-region sync state.
gco inference status ENDPOINT_NAMEExample:
gco inference status my-llmScale an inference endpoint to a new replica count (applied across all target regions).
gco inference scale ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--replicas |
-r |
New replica count (required) |
Example:
gco inference scale my-llm --replicas 4Stop an inference endpoint (scales to zero, keeps configuration).
gco inference stop ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
Example:
gco inference stop my-llm -yStart a stopped inference endpoint.
gco inference start ENDPOINT_NAMEExample:
gco inference start my-llmDelete an inference endpoint from all regions. The inference_monitor in each region cleans up K8s resources.
gco inference delete ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
Example:
gco inference delete my-llm -yUpdate the container image for an endpoint. Triggers a rolling update across all target regions.
gco inference update-image ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--image |
-i |
New container image (required) |
Example:
gco inference update-image my-llm -i vllm/vllm-openai:v0.20.1Send a request to an inference endpoint via the API Gateway. Auto-detects the framework (vLLM, TGI, Triton) and builds the appropriate request body.
gco inference invoke ENDPOINT_NAME [OPTIONS]Arguments:
ENDPOINT_NAME- Name of the inference endpoint
Options:
| Option | Short | Description |
|---|---|---|
--prompt |
-p |
Text prompt to send |
--data |
-d |
Raw JSON body (overrides --prompt) |
--path |
API sub-path (default: auto-detected from image) | |
--region |
-r |
Target region for the request |
--max-tokens |
Max tokens to generate (default: 100) | |
--stream/--no-stream |
Stream the response |
Example:
# Simple prompt (auto-detects vLLM OpenAI-compatible format)
gco inference invoke my-llm -p "What is GPU orchestration?"
# With max tokens
gco inference invoke my-llm -p "Explain Kubernetes" --max-tokens 200
# Raw JSON body
gco inference invoke my-llm -d '{"prompt": "Hello", "max_tokens": 50}'
# Explicit API path
gco inference invoke my-llm -p "Hello" --path /v1/chat/completionsCheck if an inference endpoint is healthy and ready to serve requests. Hits the endpoint's health check path and reports HTTP status and round-trip latency.
gco inference health ENDPOINT_NAME [OPTIONS]Arguments:
ENDPOINT_NAME- Name of the inference endpoint
Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region to check |
Example:
# Check health (nearest region via Global Accelerator)
gco inference health my-llm
# Check health in a specific region
gco inference health my-llm -r us-east-1List models loaded on an inference endpoint. Queries the /v1/models path (OpenAI-compatible) to discover which models are available.
gco inference models ENDPOINT_NAME [OPTIONS]Arguments:
ENDPOINT_NAME- Name of the inference endpoint
Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Target region to query |
Example:
# List loaded models
gco inference models my-llm
# Query a specific region
gco inference models my-llm -r eu-west-1Start a canary deployment with a new image. Routes a percentage of traffic to the canary while the primary continues serving the rest.
gco inference canary ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--image |
-i |
New container image for canary (required) |
--weight |
-w |
Percentage of traffic to canary, 1-99 (default: 10) |
--replicas |
-r |
Number of canary replicas (default: 1) |
Examples:
# 10% traffic to new version
gco inference canary my-llm -i vllm/vllm-openai:v0.20.1
# 25% traffic with 2 canary replicas
gco inference canary my-llm -i vllm/vllm-openai:v0.20.1 -w 25 -r 2Promote the canary to primary. Replaces the primary image with the canary image and removes the canary deployment. All traffic goes to the new image.
gco inference promote ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
Example:
gco inference promote my-llm -yRemove the canary deployment, keeping the primary unchanged. All traffic returns to the primary.
gco inference rollback ENDPOINT_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
Example:
gco inference rollback my-llm -yManage model weights in the central S3 bucket. Models uploaded here are automatically available to inference endpoints across all regions via init container sync.
See Inference Guide for details on model weight management.
Upload model weights to the central S3 bucket.
gco models upload LOCAL_PATH [OPTIONS]Arguments:
LOCAL_PATH- Local file or directory path
Options:
| Option | Short | Description |
|---|---|---|
--name |
-n |
Model name in the registry (required) |
Example:
gco models upload ./my-model-weights/ --name llama3-8b
gco models upload ./weights.safetensors --name my-modelList models in the central S3 bucket.
gco models listExample:
gco models listDelete a model and all its files from the S3 bucket.
gco models delete MODEL_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip confirmation |
Example:
gco models delete llama3-8b -yGet the S3 URI for a model (for use with --model-source in inference deploy).
gco models uri MODEL_NAMEExample:
gco models uri llama3-8b
# Output: s3://gco-models-xxx/models/llama3-8bManage file systems and download job outputs.
List files on shared storage.
gco files list [OPTIONS]
gco files ls [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region |
--type |
-t |
Storage type: efs or fsx |
--path |
-p |
Path to list |
Example:
gco files ls -r us-east-1
gco files list -r us-east-1 -t fsx -p /scratchDownload files from shared storage.
gco files download REMOTE_PATH LOCAL_PATH [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region |
--type |
-t |
Storage type: efs or fsx |
Example:
gco files download my-job/outputs ./results -r us-east-1
gco files download training-run ./checkpoints -r us-west-2 -t fsxManage Karpenter NodePools.
List NodePools in a cluster.
gco nodepools list [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region |
Describe a specific NodePool.
gco nodepools describe NODEPOOL_NAME [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--region |
-r |
Region |
Generate NodePool manifest for ODCR (On-Demand Capacity Reservation).
gco nodepools create-odcr [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--name |
-n |
NodePool name |
--capacity-reservation-id |
ODCR ID | |
--instance-type |
-i |
Instance type |
--output |
-o |
Output file path |
Example:
gco nodepools create-odcr \
--name gpu-reserved \
--capacity-reservation-id cr-0123456789abcdef0 \
--instance-type g5.xlarge \
--output nodepool.yamlManage the optional GCO analytics environment (SageMaker Studio + EMR Serverless + Cognito). The feature is off by default; enable it only when you want interactive notebook analytics. See the Analytics Guide for end-to-end workflows.
All gco analytics * commands auto-discover the Cognito user-pool ID
and API Gateway endpoint from the gco-analytics and gco-api-gateway
CloudFormation outputs, so no manual ID wiring is needed.
Flip analytics_environment.enabled to true in cdk.json. Prints
the follow-up gco stacks deploy gco-analytics command — does not
deploy automatically.
gco analytics enable [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--hyperpod |
Also set analytics_environment.hyperpod.enabled=true (adds HyperPod training-job permissions to the SageMaker execution role). |
|
--canvas |
Also set analytics_environment.canvas.enabled=true (attaches AmazonSageMakerCanvasFullAccess to the SageMaker execution role and enables the Canvas app on the Studio domain; artifacts land under Cluster_Shared_Bucket/analytics-canvas/). |
|
--yes |
-y |
Skip the confirmation prompt. |
Example:
gco analytics enable
gco analytics enable --hyperpod
gco analytics enable --canvas
gco analytics enable --hyperpod --canvas -y
# Follow-up to actually deploy the stack:
gco stacks deploy gco-analyticsFlip analytics_environment.enabled to false in cdk.json. Leaves
the hyperpod, canvas, cognito, and efs sub-blocks untouched so
a later enable preserves your preferences. Run gco stacks destroy gco-analytics afterward to tear down the deployed resources.
gco analytics disable [OPTIONS]Options:
| Option | Short | Description |
|---|---|---|
--yes |
-y |
Skip the confirmation prompt. |
Example:
gco analytics disable
gco analytics disable -y
gco stacks destroy gco-analyticsShow the current analytics_environment.* toggle state from cdk.json
plus the deployment state of gco-analytics.
gco analytics statusExample:
gco analytics statusCreate a Cognito user in the analytics user pool. Calls
cognito-idp:AdminCreateUser and prints the temporary password to
stdout exactly once. Optionally sets a permanent password via
cognito-idp:AdminSetUserPassword so the user can sign in without the
NEW_PASSWORD_REQUIRED challenge on first login.
gco analytics users add [OPTIONS]Options:
| Option | Description |
|---|---|
--username |
Cognito username to create (required). |
--email |
Email address for the new user. |
--no-email |
Suppress the Cognito welcome email (MessageAction=SUPPRESS). |
--password |
Set a permanent password on the new user (also read from $GCO_STUDIO_PASSWORD). Mutually exclusive with --generate-password. |
--generate-password |
Generate a strong random password, set it permanent, and print it once. Mutually exclusive with --password. |
Example:
gco analytics users add --username alice --email alice@example.com
gco analytics users add --username bob --email bob@example.com --no-email
# Set a permanent password so first-time login doesn't hit NEW_PASSWORD_REQUIRED
gco analytics users add --username carol --no-email --generate-password
GCO_STUDIO_PASSWORD='StrongP@ssw0rd!' gco analytics users add --username dave --no-email --password "$GCO_STUDIO_PASSWORD"List Cognito users in the analytics user pool. Default output is a
formatted table via the existing OutputFormatter.
gco analytics users list [OPTIONS]Options:
| Option | Description |
|---|---|
--as-json |
Emit JSON instead of a table. |
Example:
gco analytics users list
gco analytics users list --as-jsonDelete a Cognito user from the analytics user pool. Does not delete
the user's Studio user profile or EFS home folder — use
aws sagemaker delete-user-profile for that.
gco analytics users remove [OPTIONS]Options:
| Option | Description |
|---|---|
--username |
Cognito username to remove (required). |
--yes |
Skip the confirmation prompt. |
Example:
gco analytics users remove --username alice
gco analytics users remove --username alice --yesChange a Cognito user's password via AdminSetUserPassword. By
default the new password is marked permanent so the user can sign in
directly with gco analytics studio login without hitting the
NEW_PASSWORD_REQUIRED challenge. Pass --temporary to require
the user to choose their own password on first sign-in.
gco analytics users set-password [OPTIONS]Options:
| Option | Description |
|---|---|
--username |
Cognito username whose password to change (required). |
--password |
New password (also read from $GCO_STUDIO_PASSWORD; prompted otherwise). Mutually exclusive with --generate-password. |
--generate-password |
Generate a strong random password, set it, and print it once. Mutually exclusive with --password. |
--temporary |
Set the password as temporary (Permanent=false). Default is permanent. |
--yes, -y |
Skip the confirmation prompt. |
Examples:
# Interactive — prompts twice for the new password
gco analytics users set-password --username alice
# Non-interactive via env var (won't leak into shell history)
GCO_STUDIO_PASSWORD='StrongP@ssw0rd!' \
gco analytics users set-password --username alice --yes
# Generate and print a new password
gco analytics users set-password --username alice --generate-password --yes
# Force the user to reset on next login
gco analytics users set-password --username alice \
--password 'Temp!Reset123$' --temporary --yesSign in to SageMaker Studio via Cognito SRP and print a presigned
Studio URL on its own line on stdout (pipe-friendly). The password,
IdToken, and URL are never written to disk.
gco analytics studio login [OPTIONS]Options:
| Option | Description |
|---|---|
--username |
Cognito username (required). |
--password |
Password. Defaults to prompt (click.prompt(..., hide_input=True)). Also read from $GCO_STUDIO_PASSWORD if set. |
--api-url |
Override the API Gateway base URL (otherwise auto-discovered from CloudFormation). |
--open |
Launch the default browser on the presigned URL after printing it. |
Example:
# Interactive (prompts for password)
gco analytics studio login --username alice
# Non-interactive
export GCO_STUDIO_PASSWORD='...'
gco analytics studio login --username alice
# Open browser automatically
gco analytics studio login --username alice --open
# Custom API endpoint
gco analytics studio login \
--username alice \
--api-url https://abc123.execute-api.us-east-2.amazonaws.comRun pre-flight checks before gco stacks deploy gco-analytics. Each
check prints ✓/✗ plus a short remediation line. Exits 1 on any
failing check.
Checks performed:
cdk.jsonis present and parses as JSONgco-global,gco-api-gateway, and every regional stack areCREATE_COMPLETE- The three
/gco/cluster-shared-bucket/*SSM parameters are present in the global region - No orphaned retained analytics resources are left from a previous
retain-policy destroy
gco analytics doctorExample:
gco analytics doctorCreate ~/.gco/config.yaml:
default_region: us-east-1
output_format: table
verbose: false
regions:
- us-east-1
- us-west-2
- eu-west-1Project configuration in cdk.json:
{
"context": {
"project_name": "gco",
"deployment_regions": {
"global": "us-east-2",
"api_gateway": "us-east-2",
"monitoring": "us-east-2",
"regional": ["us-east-1", "us-west-2"]
},
"resource_thresholds": {
"cpu_threshold": 60,
"memory_threshold": 60,
"gpu_threshold": -1,
"pending_pods_threshold": 10,
"pending_requested_cpu_vcpus": 100,
"pending_requested_memory_gb": 200,
"pending_requested_gpus": -1
},
"fsx_lustre": {
"enabled": false,
"storage_capacity_gib": 1200
}
}
}Set any threshold to -1 to disable that health check. This is useful when running GPU inference endpoints that naturally saturate GPU resources.
| Variable | Description |
|---|---|
AWS_REGION |
Default AWS region |
AWS_PROFILE |
AWS credentials profile |
GCO_CONFIG |
Path to config file |
GCO_REGIONAL_API |
Use regional API endpoints (true/false) |
CDK_DOCKER |
Docker command (docker or finch) |
# 1. Deploy (bootstrap runs automatically if needed)
export CDK_DOCKER=finch
gco stacks deploy-all -y
# 2. Check capacity
gco capacity status
gco capacity recommend-region --gpu
# 3. Submit jobs
gco jobs submit-sqs examples/simple-job.yaml --region us-east-1
gco jobs queue-status --all-regions
# 4. Monitor jobs
gco jobs list --all-regions
gco jobs logs my-job -r us-east-1 -n gco-jobs
# 5. Download outputs
gco files ls -r us-east-1
gco files download my-job/outputs ./results -r us-east-1
# 6. Cleanup
gco stacks destroy-all -y# 1. Upload model weights
gco models upload ./llama3-weights/ --name llama3-8b
# 2. Deploy inference endpoint
gco inference deploy my-llm \
-i vllm/vllm-openai:v0.20.1 \
--gpu-count 1 \
--model-source $(gco models uri llama3-8b) \
-e MODEL=/models/my-llm \
-r us-east-1
# 3. Monitor deployment
gco inference status my-llm
# 4. Scale for production
gco inference scale my-llm --replicas 3
# Or enable autoscaling
gco inference deploy my-llm \
-i vllm/vllm-openai:v0.20.1 \
--replicas 2 --gpu-count 1 \
--min-replicas 1 --max-replicas 8 \
--autoscale-metric cpu:70
# 5. Rolling update
gco inference update-image my-llm -i vllm/vllm-openai:v0.20.1
# 6. Cleanup
gco inference delete my-llm -y
gco models delete llama3-8b -y# Check GPU capacity
gco capacity check -i g5.xlarge -r us-east-1
# Submit GPU job
gco jobs submit-sqs examples/gpu-job.yaml --auto-region
# Monitor
gco jobs list --all-regions
gco jobs logs gpu-test-job -r us-east-1 -n gco-jobs# Deploy to multiple regions
gco stacks deploy-all -y --parallel --max-workers 4
# Check status across regions
gco stacks list --all-regions
gco capacity status"No credentials found"
# Ensure AWS credentials are configured
aws sts get-caller-identity"Endpoint request timed out"
- Wait 1-2 minutes after deployment for ALB targets to become healthy
- Use
submit-sqsorsubmit-directinstead ofsubmit
"kubectl access denied"
- Add your IAM principal to EKS access entries:
aws eks create-access-entry \
--cluster-name gco-us-east-1 \
--principal-arn arn:aws:iam::ACCOUNT:user/YOUR-USER \
--region us-east-1
aws eks associate-access-policy \
--cluster-name gco-us-east-1 \
--principal-arn arn:aws:iam::ACCOUNT:user/YOUR-USER \
--policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \
--access-scope type=cluster \
--region us-east-1"CDK bootstrap required"
This should resolve automatically — deploy and deploy-all auto-bootstrap un-bootstrapped regions. If it persists:
gco stacks bootstrap --region us-east-1# Enable verbose output
gco -v jobs list --all-regions
# Check AWS configuration
aws sts get-caller-identity
aws eks list-clusters --region us-east-1For more help, see: