Skip to content

Latest commit

 

History

History
2422 lines (1681 loc) · 54.5 KB

File metadata and controls

2422 lines (1681 loc) · 54.5 KB

GCO CLI Reference

Complete command-line interface documentation for GCO (Global Capacity Orchestrator on AWS).

Table of Contents

Installation

Using pipx (Recommended)

# Install pipx if not already installed
brew install pipx && pipx ensurepath  # macOS
# or
pip install pipx && pipx ensurepath   # Linux/Windows

# Install GCO CLI
pipx install -e .

Using pip

pip install -e .

Verify Installation

gco --version
gco --help

Global Options

These options are available for all commands:

Option Short Description
--config -c Path to config file
--region -r Default AWS region
--output -o Output format: table, json, yaml
--verbose -v Enable verbose output
--regional-api Use regional API endpoints (for private access)
--help Show help message
--version Show version

Regional API Mode

When --regional-api is enabled (or GCO_REGIONAL_API=true environment variable is set), the CLI routes requests through regional API Gateways instead of the global API Gateway. This is required when:

  • The ALB is internal-only (no public exposure)
  • Public access is disabled on the EKS cluster
  • Maximum security posture is required
# Use regional API for a single command
gco --regional-api jobs list --region us-east-1

# Or set environment variable for all commands
export GCO_REGIONAL_API=true
gco jobs list --region us-east-1

Commands

Jobs Commands

Manage jobs across GCO clusters.

gco jobs submit

Submit a job via API Gateway (SigV4 authenticated).

gco jobs submit MANIFEST_PATH [OPTIONS]

Arguments:

  • MANIFEST_PATH - Path to YAML manifest file

Options:

Option Short Description
--namespace -n Fallback namespace for manifests that don't declare their own (manifest metadata.namespace takes precedence)
--region -r Target specific region
--dry-run Validate without applying
--label -l Add labels (key=value), can be repeated
--wait -w Wait for job completion
--timeout Wait timeout in seconds (default: 3600)

Example:

gco jobs submit examples/simple-job.yaml -n gco-jobs
gco jobs submit job.yaml --dry-run
gco jobs submit job.yaml -l team=ml -l priority=high

gco jobs submit-sqs

Submit a job via SQS queue (recommended for production).

gco jobs submit-sqs MANIFEST_PATH [OPTIONS]

Options:

Option Short Description
--region -r Target region for SQS queue
--auto-region Auto-select optimal region based on capacity
--priority -p Job priority (0-100, higher = more important)
--namespace -n Fallback namespace for manifests that don't declare their own (manifest metadata.namespace takes precedence)

Example:

gco jobs submit-sqs examples/simple-job.yaml --region us-east-1
gco jobs submit-sqs job.yaml --auto-region --priority 10

gco jobs submit-direct

Submit a job directly via kubectl (requires EKS access).

If a job with the same name already exists:

  • Completed or failed jobs are silently deleted and replaced
  • Active (running/pending) jobs are preserved, and the new submission is auto-renamed with a -{5char} suffix
gco jobs submit-direct MANIFEST_PATH [OPTIONS]

Options:

Option Short Description
--region -r Target region
--namespace -n Fallback namespace for manifests that don't declare their own (manifest metadata.namespace takes precedence)

Example:

gco jobs submit-direct examples/simple-job.yaml --region us-east-1 -n gco-jobs

gco jobs submit-queue

Submit a job to the global DynamoDB queue for regional pickup.

gco jobs submit-queue MANIFEST_PATH [OPTIONS]

Jobs are stored in DynamoDB and picked up by the target region's manifest processor. This enables global job submission with centralized tracking and status history.

Options:

Option Short Description
--region -r Target region for job execution (required)
--namespace -n Kubernetes namespace
--priority -p Job priority (0-100, higher = more important)
--label -l Add labels (key=value), can be repeated

Example:

gco jobs submit-queue examples/simple-job.yaml --region us-east-1
gco jobs submit-queue job.yaml -r us-west-2 --priority 50
gco jobs submit-queue job.yaml -r us-east-1 -l team=ml -l project=training

Note: Use gco queue list or gco queue get <job_id> to track job status.

gco jobs list

List jobs in GCO clusters.

gco jobs list [OPTIONS]

Options:

Option Short Description
--region -r Target region (required unless --all-regions)
--all-regions -a Query all regions via global API
--namespace -n Filter by namespace
--status -s Filter by status
--limit -l Maximum results (default: 50)

Example:

gco jobs list --region us-east-1
gco jobs list --all-regions
gco jobs list -r us-west-2 -n gco-jobs --status running

gco jobs get

Get details of a specific job.

gco jobs get JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace

Example:

gco jobs get my-job --region us-east-1
gco jobs get training-job -r us-west-2 -n ml-jobs

gco jobs logs

Get logs from a job.

gco jobs logs JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace
--tail -t Number of lines to show
--container -c Container name (for multi-container pods)

Example:

gco jobs logs my-job --region us-east-1
gco jobs logs my-job -r us-east-1 --tail 500
gco jobs logs multi-container-job -r us-east-1 --container sidecar

gco jobs delete

Delete a job.

gco jobs delete JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace
--yes -y Skip confirmation

Example:

gco jobs delete my-job --region us-east-1
gco jobs delete old-job -r us-west-2 -n ml-jobs -y

gco jobs events

Get Kubernetes events for a job.

gco jobs events JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace

Example:

gco jobs events my-job --region us-east-1
gco jobs events training-job -r us-west-2 -n ml-jobs

gco jobs pods

Get pod details for a job.

gco jobs pods JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace

Example:

gco jobs pods my-job --region us-east-1
gco jobs pods training-job -r us-west-2 -n ml-jobs

gco jobs metrics

Get resource usage metrics for a job.

gco jobs metrics JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace

Example:

gco jobs metrics my-job --region us-east-1
gco jobs metrics training-job -r us-west-2 -n ml-jobs

gco jobs retry

Retry a failed job.

gco jobs retry JOB_NAME [OPTIONS]

Options:

Option Short Description
--region -r Job region (required)
--namespace -n Job namespace
--yes -y Skip confirmation

Example:

gco jobs retry failed-job --region us-east-1
gco jobs retry training-job -r us-west-2 -n ml-jobs -y

gco jobs bulk-delete

Bulk delete jobs based on filters.

gco jobs bulk-delete [OPTIONS]

Options:

Option Short Description
--region -r Target region (required unless --all-regions)
--all-regions -a Delete across all regions
--namespace -n Filter by namespace
--status -s Filter by status
--older-than-days -d Delete jobs older than N days
--label-selector -l Kubernetes label selector
--dry-run Only show what would be deleted (default)
--execute Actually delete (disables dry-run)
--yes -y Skip confirmation

Example:

gco jobs bulk-delete --region us-east-1 --status completed --older-than-days 7
gco jobs bulk-delete -r us-west-2 -n gco-jobs -s failed --execute -y
gco jobs bulk-delete --all-regions --status failed --older-than-days 30 --execute

gco jobs health

Get health status of GCO clusters.

gco jobs health [OPTIONS]

Options:

Option Short Description
--region -r Target region (required unless --all-regions)
--all-regions -a Get health across all regions

Example:

gco jobs health --region us-east-1
gco jobs health --all-regions

gco jobs queue-status

View SQS queue status across regions.

gco jobs queue-status [OPTIONS]

Options:

Option Short Description
--region -r Filter by region
--all-regions Show all regions

Example:

gco jobs queue-status --all-regions
gco jobs queue-status -r us-east-1

Queue Commands

Manage the global job queue (DynamoDB-backed). The job queue provides centralized job submission and tracking across all regions.

gco queue submit

Submit a job to the global queue for regional pickup.

gco queue submit MANIFEST_PATH [OPTIONS]

Options:

Option Short Description
--region -r Target region for job execution (required)
--namespace -n Kubernetes namespace
--priority -p Job priority (0-100, higher = more important)
--label -l Add labels (key=value), can be repeated

Example:

gco queue submit job.yaml --region us-east-1
gco queue submit job.yaml -r us-west-2 --priority 50
gco queue submit job.yaml -r us-east-1 -l team=ml -l project=training

gco queue list

List jobs in the global queue.

gco queue list [OPTIONS]

Options:

Option Short Description
--region -r Filter by target region
--status -s Filter by status (queued, claimed, running, succeeded, failed, cancelled)
--namespace -n Filter by namespace
--limit -l Maximum results (default: 50)

Example:

gco queue list
gco queue list --region us-east-1 --status queued
gco queue list -s running

gco queue get

Get details of a queued job including status history.

gco queue get JOB_ID [OPTIONS]

Options:

Option Short Description
--region -r Region to query (any region works)

Example:

gco queue get abc123-def456
gco queue get abc123-def456 --region us-east-1

gco queue cancel

Cancel a queued job (only works for jobs not yet running).

gco queue cancel JOB_ID [OPTIONS]

Options:

Option Short Description
--reason Cancellation reason
--region -r Region to query (any region works)
--yes -y Skip confirmation

Example:

gco queue cancel abc123-def456
gco queue cancel abc123-def456 --reason "No longer needed" -y

gco queue stats

Get job queue statistics by region and status.

gco queue stats [OPTIONS]

Options:

Option Short Description
--region -r Region to query (any region works)

Example:

gco queue stats

Templates Commands

Manage job templates. Templates are reusable job configurations stored in DynamoDB with parameter substitution support.

gco templates list

List all job templates.

gco templates list [OPTIONS]

Options:

Option Short Description
--region -r Region to query

Example:

gco templates list

gco templates get

Get details of a specific template.

gco templates get TEMPLATE_NAME [OPTIONS]

Options:

Option Short Description
--region -r Region to query

Example:

gco templates get gpu-training-template

gco templates create

Create a new job template from a manifest file.

gco templates create MANIFEST_PATH [OPTIONS]

Options:

Option Short Description
--name -n Template name (required)
--description -d Template description
--param -p Default parameter (key=value), can be repeated
--region -r Region to create in

Example:

gco templates create job.yaml --name gpu-training -d "GPU training template"
gco templates create job.yaml -n my-template -p image=pytorch:latest -p gpus=4

gco templates delete

Delete a job template.

gco templates delete TEMPLATE_NAME [OPTIONS]

Options:

Option Short Description
--region -r Region
--yes -y Skip confirmation

Example:

gco templates delete old-template -y

gco templates run

Create and run a job from a template.

gco templates run TEMPLATE_NAME [OPTIONS]

Options:

Option Short Description
--name -n Job name (required)
--region -r Target region (required)
--namespace Kubernetes namespace
--param -p Parameter override (key=value), can be repeated

Example:

gco templates run gpu-training --name my-job --region us-east-1
gco templates run gpu-template -n my-job -r us-east-1 -p image=custom:v1 -p gpus=8

Webhooks Commands

Manage webhooks for job event notifications. Webhooks receive HTTP POST notifications when job events occur.

gco webhooks list

List all registered webhooks.

gco webhooks list [OPTIONS]

Options:

Option Short Description
--namespace -n Filter by namespace
--region -r Region to query

Example:

gco webhooks list
gco webhooks list --namespace gco-jobs

gco webhooks create

Register a new webhook for job events.

gco webhooks create [OPTIONS]

Options:

Option Short Description
--url -u Webhook URL (required)
--event -e Event type (job.started, job.completed, job.failed), can be repeated
--namespace -n Filter events by namespace
--secret -s HMAC secret for signature verification
--region -r Region to create in

Example:

gco webhooks create --url https://example.com/webhook -e job.completed -e job.failed
gco webhooks create -u https://slack.com/webhook -e job.failed -n gco-jobs

gco webhooks delete

Delete a webhook.

gco webhooks delete WEBHOOK_ID [OPTIONS]

Options:

Option Short Description
--region -r Region
--yes -y Skip confirmation

Example:

gco webhooks delete abc12345 -y

Stacks Commands

Manage CDK infrastructure stacks.

gco stacks list

List all GCO stacks.

gco stacks list [OPTIONS]

Options:

Option Short Description
--region -r Filter by region
--all-regions List from all regions

gco stacks status

Get detailed status of a stack.

gco stacks status STACK_NAME [OPTIONS]

Options:

Option Short Description
--region -r Stack region

Example:

gco stacks status gco-us-east-1 --region us-east-1

gco stacks deploy

Deploy a single stack. Automatically bootstraps CDK in the target region if needed.

gco stacks deploy STACK_NAME [OPTIONS]

Options:

Option Short Description
--region -r Stack region
--yes -y Skip confirmation

Example:

gco stacks deploy gco-us-east-1 -y

gco stacks deploy-all

Deploy all stacks in correct order. Automatically bootstraps CDK in any un-bootstrapped regions before deploying.

gco stacks deploy-all [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation
--parallel -p Deploy regional stacks in parallel
--max-workers -w Max parallel workers (default: 4)

Example:

gco stacks deploy-all -y
gco stacks deploy-all -y --parallel --max-workers 8

gco stacks destroy

Destroy a single stack.

gco stacks destroy STACK_NAME [OPTIONS]

Options:

Option Short Description
--region -r Stack region
--yes -y Skip confirmation

gco stacks destroy-all

Destroy all stacks in correct order.

gco stacks destroy-all [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation
--parallel -p Destroy regional stacks in parallel
--max-workers -w Max parallel workers (default: 4)

gco stacks bootstrap

Bootstrap CDK in a region. This is run automatically by deploy and deploy-all when needed, so manual bootstrapping is optional.

gco stacks bootstrap [OPTIONS]

Options:

Option Short Description
--region -r Region to bootstrap

gco stacks access

Configure kubectl access to a GCO EKS cluster. Updates kubeconfig, creates an EKS access entry for your IAM principal, and associates the cluster admin policy. Handles assumed roles automatically.

gco stacks access [OPTIONS]

Options:

Option Short Description
--cluster -c Cluster name (default: gco-{region})
--region -r AWS region (default: first deployment region)

Examples:

gco stacks access                             # Auto-detect region from cdk.json
gco stacks access -r us-west-2                # Specific region
gco stacks access -c my-cluster -r eu-west-1  # Custom cluster name

gco stacks fsx

Manage FSx for Lustre storage.

gco stacks fsx COMMAND [OPTIONS]

Subcommands:

  • status - Show FSx status
  • enable - Enable FSx for Lustre
  • disable - Disable FSx for Lustre

Example:

gco stacks fsx status
gco stacks fsx enable --storage-capacity 1200 -y
gco stacks fsx disable -y

gco stacks valkey

Manage Valkey Serverless cache.

gco stacks valkey COMMAND [OPTIONS]

Subcommands:

  • status - Show Valkey configuration status
  • enable - Enable Valkey Serverless cache
  • disable - Disable Valkey Serverless cache

Example:

gco stacks valkey status
gco stacks valkey enable --max-storage 10 --max-ecpu 10000 -y
gco stacks valkey disable -y

gco stacks aurora

Manage Aurora PostgreSQL (pgvector) database.

gco stacks aurora COMMAND [OPTIONS]

Subcommands:

  • status - Show Aurora pgvector configuration status
  • enable - Enable Aurora Serverless v2 with pgvector
  • disable - Disable Aurora pgvector

Example:

gco stacks aurora status
gco stacks aurora enable --min-acu 2 --max-acu 32 --deletion-protection -y
gco stacks aurora disable -y

DAG Commands

Run multi-step job pipelines with dependencies. Define a DAG in YAML, and GCO runs steps in dependency order, skipping downstream steps if a dependency fails.

gco dag run

Execute a DAG pipeline.

gco dag run DAG_FILE [OPTIONS]

Options:

Option Short Description
--region -r Region to run in (default: from DAG file or first deployed)
--timeout -t Timeout per step in seconds (default: 3600)
--dry-run Validate and show execution order without running

Examples:

# Run a pipeline
gco dag run pipeline.yaml -r us-east-1

# Preview execution order
gco dag run pipeline.yaml --dry-run

gco dag validate

Validate a DAG definition without running it. Checks for cycles, missing dependencies, and missing manifest files.

gco dag validate DAG_FILE

Example:

gco dag validate examples/pipeline-dag.yaml

DAG File Format

name: my-pipeline
region: us-east-1          # optional, auto-detects if omitted
namespace: gco-jobs    # optional, defaults to gco-jobs

steps:
  - name: preprocess
    manifest: examples/preprocess-job.yaml

  - name: train
    manifest: examples/train-job.yaml
    depends_on: [preprocess]

  - name: evaluate
    manifest: examples/evaluate-job.yaml
    depends_on: [train]

Steps without depends_on run first. Steps with dependencies wait until all dependencies succeed. If a step fails, all downstream steps are automatically skipped.

Use shared EFS storage (/mnt/shared) to pass data between steps.


Costs Commands

View cost breakdowns and estimates for GCO resources. Uses AWS Cost Explorer filtered by the Project: GCO tag applied to all resources.

Setup (one-time): To filter costs by the Project tag, you must activate cost allocation tags in your AWS account:

  1. Go to the AWS Billing Console → Cost Allocation Tags
  2. Search for the Project tag under "User-defined cost allocation tags"
  3. Select it and click "Activate"
  4. Wait ~24 hours for tag data to appear in Cost Explorer

Until the tag is activated, use --all to see total account costs:

gco costs summary --all

You can also activate the Environment and Owner tags for more granular filtering in the AWS Cost Explorer console.

gco costs summary

Show total GCO spend broken down by AWS service.

gco costs summary [OPTIONS]

Options:

Option Short Description
--days -d Number of days to look back (default: 30)
--all Show all account costs, not filtered by GCO tag

Examples:

# Last 30 days (default)
gco costs summary

# Last 7 days
gco costs summary --days 7

# All account costs (before tags are activated)
gco costs summary --all

# JSON output
gco --output json costs summary

gco costs regions

Show cost breakdown by AWS region.

gco costs regions [OPTIONS]

Options:

Option Short Description
--days -d Number of days to look back (default: 30)

Examples:

gco costs regions
gco costs regions --days 7

gco costs trend

Show daily cost trend with a visual bar chart.

gco costs trend [OPTIONS]

Options:

Option Short Description
--days -d Number of days to show (default: 14)
--all Show all account costs, not filtered by GCO tag

Examples:

gco costs trend
gco costs trend --days 7
gco costs trend --all

gco costs workloads

Estimate costs for currently running workloads (jobs and inference endpoints) based on instance pricing and runtime.

gco costs workloads [OPTIONS]

Options:

Option Short Description
--region -r Region to check (default: all deployment regions)

Examples:

# All regions
gco costs workloads

# Specific region
gco costs workloads -r us-east-1

gco costs forecast

Forecast GCO costs for the next N days based on historical spending patterns.

gco costs forecast [OPTIONS]

Options:

Option Short Description
--days -d Days to forecast ahead (default: 30)

Examples:

gco costs forecast
gco costs forecast --days 60

Note: Cost Explorer needs at least 14 days of historical data to generate forecasts.


Capacity Commands

Check and manage cluster capacity.

gco capacity check

Check capacity for a specific instance type.

gco capacity check [OPTIONS]

Options:

Option Short Description
--instance-type -i Instance type to check
--region -r Region to check
--type -t Capacity type: spot, on-demand, or both

Example:

gco capacity check --instance-type g4dn.xlarge --region us-east-1
gco capacity check -i g5.xlarge -r us-west-2 -t spot

gco capacity status

View capacity status across regions.

gco capacity status [OPTIONS]

Options:

Option Short Description
--region -r Filter by region

gco capacity recommend

Get capacity recommendation for an instance type.

gco capacity recommend [OPTIONS]

Options:

Option Short Description
--instance-type -i Instance type
--region -r Region

gco capacity recommend-region

Get optimal region recommendation.

gco capacity recommend-region [OPTIONS]

Options:

Option Short Description
--gpu Recommend for GPU workloads
--instance-type -i Specific instance type (enables weighted scoring)
--gpu-count Number of GPUs required
--min-gpus Minimum GPUs required

When --instance-type is provided, the recommendation uses weighted multi-signal scoring that combines spot placement scores, spot-vs-on-demand pricing, queue depth, GPU utilization, and running job counts. Without it, a simpler composite score is used.

Example:

gco capacity recommend-region --gpu
gco capacity recommend-region -i g5.xlarge
gco capacity recommend-region -i p4d.24xlarge --gpu-count 8

gco capacity ai-recommend

Get AI-powered capacity recommendation using Amazon Bedrock.

⚠️ DISCLAIMER: Recommendations are AI-generated and should be validated before making production decisions. Capacity availability and pricing can change rapidly.

gco capacity ai-recommend [OPTIONS]

This command gathers comprehensive capacity data including:

  • Spot placement scores and pricing across regions
  • On-demand availability and pricing
  • Current cluster utilization (queue depth, GPU/CPU usage)
  • Running and pending job counts

The data is analyzed by an LLM (Claude by default) to provide intelligent recommendations.

Requirements:

  • AWS credentials with bedrock:InvokeModel permission
  • The specified Bedrock model must be enabled in your account

Options:

Option Short Description
--workload -w Description of your workload
--instance-type -i Instance types to consider (can specify multiple)
--region -r Regions to consider (can specify multiple)
--gpu Workload requires GPUs
--min-gpus Minimum GPUs required
--min-memory-gb Minimum memory in GB
--fault-tolerance -f Fault tolerance level: high, medium, low
--max-cost Maximum cost per hour in USD
--model -m Bedrock model ID to use
--raw Show raw AI response

Example:

# Basic recommendation
gco capacity ai-recommend --workload "Training a large language model"

# GPU workload with specific requirements
gco capacity ai-recommend -w "Inference workload" --gpu --min-gpus 4

# Compare specific instance types and regions
gco capacity ai-recommend -i g5.xlarge -i g5.2xlarge -r us-east-1 -r us-west-2

# Cost-constrained recommendation
gco capacity ai-recommend --fault-tolerance high --max-cost 5.00

# Use a different model
gco capacity ai-recommend -w "ML training" --model us.anthropic.claude-3-haiku-20240307-v1:0

gco capacity reservations

List On-Demand Capacity Reservations (ODCRs) across deployed regions.

gco capacity reservations [OPTIONS]
Option Description
-i, --instance-type Filter by instance type
-r, --region Specific region (default: all deployed regions)
# List all active reservations
gco capacity reservations

# Filter by instance type
gco capacity reservations -i p5.48xlarge

# Check a specific region
gco capacity reservations -r us-east-1

gco capacity reservation-check

Check reservation availability and Capacity Block offerings for ML workloads. Checks both existing ODCRs and purchasable Capacity Blocks (guaranteed GPU capacity for a fixed duration at a known price).

gco capacity reservation-check [OPTIONS]
Option Description
-i, --instance-type Instance type to check (required)
-r, --region Specific region (default: all deployed regions)
-c, --count Minimum instances needed (default: 1)
--include-blocks/--no-blocks Include Capacity Block offerings (default: yes)
--block-duration Capacity Block duration in hours (default: 24)
# Check for p5.48xlarge reservations and block offerings
gco capacity reservation-check -i p5.48xlarge

# Check with specific count and duration
gco capacity reservation-check -i p4d.24xlarge -c 2 --block-duration 48

# ODCRs only, no block offerings
gco capacity reservation-check -i g5.48xlarge -r us-east-1 --no-blocks

Inference Commands

Manage multi-region inference endpoints. Endpoints are stored in DynamoDB and reconciled by the inference_monitor in each target region.

See Inference Guide for architecture details and workflows.

gco inference deploy

Deploy an inference endpoint to one or more regions.

gco inference deploy ENDPOINT_NAME [OPTIONS]

Arguments:

  • ENDPOINT_NAME - Unique name for the endpoint

Options:

Option Short Description
--image -i Container image (required)
--region -r Target region(s), repeatable (default: all deployed regions)
--replicas Replicas per region (default: 1)
--gpu-count GPUs per replica (default: 1)
--gpu-type GPU instance type hint (e.g. g5.xlarge)
--port Container port (default: 8000)
--model-path EFS path for model weights
--model-source S3 URI for model weights (auto-synced via init container)
--health-path Health check endpoint path (default: /health)
--env -e Environment variable (KEY=VALUE), repeatable
--namespace -n Kubernetes namespace (default: gco-inference)
--label -l Label (key=value), repeatable
--min-replicas Autoscaling: minimum replicas
--max-replicas Autoscaling: maximum replicas
--autoscale-metric Autoscaling metric (e.g. cpu:70, memory:80), repeatable. Enables HPA.
--capacity-type Node capacity type: on-demand (default) or spot
--accelerator nvidia Accelerator type: nvidia for GPU instances, neuron for Trainium/Inferentia
--node-selector Node selector (key=value), repeatable. E.g. eks.amazonaws.com/instance-family=inf2
--extra-args Extra arguments passed to the container (e.g. --kv-transfer-config {...}). Repeatable

Example:

gco inference deploy my-llm -i vllm/vllm-openai:v0.20.1
gco inference deploy llama3-70b \
  -i vllm/vllm-openai:v0.20.1 \
  -r us-east-1 -r eu-west-1 \
  --replicas 2 --gpu-count 4 \
  --model-source s3://bucket/models/llama3-70b \
  -e MODEL=/models/llama3-70b

# Deploy with autoscaling (creates a Kubernetes HPA)
gco inference deploy my-llm \
  -i vllm/vllm-openai:v0.20.1 \
  --replicas 2 --gpu-count 1 \
  --min-replicas 1 --max-replicas 8 \
  --autoscale-metric cpu:70 --autoscale-metric memory:80

gco inference list

List inference endpoints.

gco inference list [OPTIONS]

Options:

Option Short Description
--state -s Filter by state (deploying, running, stopped, deleted)
--region -r Filter by target region

Example:

gco inference list
gco inference list --state running
gco inference list -r us-east-1

gco inference status

Show detailed status of an inference endpoint including per-region sync state.

gco inference status ENDPOINT_NAME

Example:

gco inference status my-llm

gco inference scale

Scale an inference endpoint to a new replica count (applied across all target regions).

gco inference scale ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--replicas -r New replica count (required)

Example:

gco inference scale my-llm --replicas 4

gco inference stop

Stop an inference endpoint (scales to zero, keeps configuration).

gco inference stop ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation

Example:

gco inference stop my-llm -y

gco inference start

Start a stopped inference endpoint.

gco inference start ENDPOINT_NAME

Example:

gco inference start my-llm

gco inference delete

Delete an inference endpoint from all regions. The inference_monitor in each region cleans up K8s resources.

gco inference delete ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation

Example:

gco inference delete my-llm -y

gco inference update-image

Update the container image for an endpoint. Triggers a rolling update across all target regions.

gco inference update-image ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--image -i New container image (required)

Example:

gco inference update-image my-llm -i vllm/vllm-openai:v0.20.1

gco inference invoke

Send a request to an inference endpoint via the API Gateway. Auto-detects the framework (vLLM, TGI, Triton) and builds the appropriate request body.

gco inference invoke ENDPOINT_NAME [OPTIONS]

Arguments:

  • ENDPOINT_NAME - Name of the inference endpoint

Options:

Option Short Description
--prompt -p Text prompt to send
--data -d Raw JSON body (overrides --prompt)
--path API sub-path (default: auto-detected from image)
--region -r Target region for the request
--max-tokens Max tokens to generate (default: 100)
--stream/--no-stream Stream the response

Example:

# Simple prompt (auto-detects vLLM OpenAI-compatible format)
gco inference invoke my-llm -p "What is GPU orchestration?"

# With max tokens
gco inference invoke my-llm -p "Explain Kubernetes" --max-tokens 200

# Raw JSON body
gco inference invoke my-llm -d '{"prompt": "Hello", "max_tokens": 50}'

# Explicit API path
gco inference invoke my-llm -p "Hello" --path /v1/chat/completions

gco inference health

Check if an inference endpoint is healthy and ready to serve requests. Hits the endpoint's health check path and reports HTTP status and round-trip latency.

gco inference health ENDPOINT_NAME [OPTIONS]

Arguments:

  • ENDPOINT_NAME - Name of the inference endpoint

Options:

Option Short Description
--region -r Target region to check

Example:

# Check health (nearest region via Global Accelerator)
gco inference health my-llm

# Check health in a specific region
gco inference health my-llm -r us-east-1

gco inference models

List models loaded on an inference endpoint. Queries the /v1/models path (OpenAI-compatible) to discover which models are available.

gco inference models ENDPOINT_NAME [OPTIONS]

Arguments:

  • ENDPOINT_NAME - Name of the inference endpoint

Options:

Option Short Description
--region -r Target region to query

Example:

# List loaded models
gco inference models my-llm

# Query a specific region
gco inference models my-llm -r eu-west-1

gco inference canary

Start a canary deployment with a new image. Routes a percentage of traffic to the canary while the primary continues serving the rest.

gco inference canary ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--image -i New container image for canary (required)
--weight -w Percentage of traffic to canary, 1-99 (default: 10)
--replicas -r Number of canary replicas (default: 1)

Examples:

# 10% traffic to new version
gco inference canary my-llm -i vllm/vllm-openai:v0.20.1

# 25% traffic with 2 canary replicas
gco inference canary my-llm -i vllm/vllm-openai:v0.20.1 -w 25 -r 2

gco inference promote

Promote the canary to primary. Replaces the primary image with the canary image and removes the canary deployment. All traffic goes to the new image.

gco inference promote ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation

Example:

gco inference promote my-llm -y

gco inference rollback

Remove the canary deployment, keeping the primary unchanged. All traffic returns to the primary.

gco inference rollback ENDPOINT_NAME [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation

Example:

gco inference rollback my-llm -y

Models Commands

Manage model weights in the central S3 bucket. Models uploaded here are automatically available to inference endpoints across all regions via init container sync.

See Inference Guide for details on model weight management.

gco models upload

Upload model weights to the central S3 bucket.

gco models upload LOCAL_PATH [OPTIONS]

Arguments:

  • LOCAL_PATH - Local file or directory path

Options:

Option Short Description
--name -n Model name in the registry (required)

Example:

gco models upload ./my-model-weights/ --name llama3-8b
gco models upload ./weights.safetensors --name my-model

gco models list

List models in the central S3 bucket.

gco models list

Example:

gco models list

gco models delete

Delete a model and all its files from the S3 bucket.

gco models delete MODEL_NAME [OPTIONS]

Options:

Option Short Description
--yes -y Skip confirmation

Example:

gco models delete llama3-8b -y

gco models uri

Get the S3 URI for a model (for use with --model-source in inference deploy).

gco models uri MODEL_NAME

Example:

gco models uri llama3-8b
# Output: s3://gco-models-xxx/models/llama3-8b

Files Commands

Manage file systems and download job outputs.

gco files list / gco files ls

List files on shared storage.

gco files list [OPTIONS]
gco files ls [OPTIONS]

Options:

Option Short Description
--region -r Region
--type -t Storage type: efs or fsx
--path -p Path to list

Example:

gco files ls -r us-east-1
gco files list -r us-east-1 -t fsx -p /scratch

gco files download

Download files from shared storage.

gco files download REMOTE_PATH LOCAL_PATH [OPTIONS]

Options:

Option Short Description
--region -r Region
--type -t Storage type: efs or fsx

Example:

gco files download my-job/outputs ./results -r us-east-1
gco files download training-run ./checkpoints -r us-west-2 -t fsx

Nodepools Commands

Manage Karpenter NodePools.

gco nodepools list

List NodePools in a cluster.

gco nodepools list [OPTIONS]

Options:

Option Short Description
--region -r Region

gco nodepools describe

Describe a specific NodePool.

gco nodepools describe NODEPOOL_NAME [OPTIONS]

Options:

Option Short Description
--region -r Region

gco nodepools create-odcr

Generate NodePool manifest for ODCR (On-Demand Capacity Reservation).

gco nodepools create-odcr [OPTIONS]

Options:

Option Short Description
--name -n NodePool name
--capacity-reservation-id ODCR ID
--instance-type -i Instance type
--output -o Output file path

Example:

gco nodepools create-odcr \
  --name gpu-reserved \
  --capacity-reservation-id cr-0123456789abcdef0 \
  --instance-type g5.xlarge \
  --output nodepool.yaml

Analytics Commands

Manage the optional GCO analytics environment (SageMaker Studio + EMR Serverless + Cognito). The feature is off by default; enable it only when you want interactive notebook analytics. See the Analytics Guide for end-to-end workflows.

All gco analytics * commands auto-discover the Cognito user-pool ID and API Gateway endpoint from the gco-analytics and gco-api-gateway CloudFormation outputs, so no manual ID wiring is needed.

gco analytics enable

Flip analytics_environment.enabled to true in cdk.json. Prints the follow-up gco stacks deploy gco-analytics command — does not deploy automatically.

gco analytics enable [OPTIONS]

Options:

Option Short Description
--hyperpod Also set analytics_environment.hyperpod.enabled=true (adds HyperPod training-job permissions to the SageMaker execution role).
--canvas Also set analytics_environment.canvas.enabled=true (attaches AmazonSageMakerCanvasFullAccess to the SageMaker execution role and enables the Canvas app on the Studio domain; artifacts land under Cluster_Shared_Bucket/analytics-canvas/).
--yes -y Skip the confirmation prompt.

Example:

gco analytics enable
gco analytics enable --hyperpod
gco analytics enable --canvas
gco analytics enable --hyperpod --canvas -y

# Follow-up to actually deploy the stack:
gco stacks deploy gco-analytics

gco analytics disable

Flip analytics_environment.enabled to false in cdk.json. Leaves the hyperpod, canvas, cognito, and efs sub-blocks untouched so a later enable preserves your preferences. Run gco stacks destroy gco-analytics afterward to tear down the deployed resources.

gco analytics disable [OPTIONS]

Options:

Option Short Description
--yes -y Skip the confirmation prompt.

Example:

gco analytics disable
gco analytics disable -y
gco stacks destroy gco-analytics

gco analytics status

Show the current analytics_environment.* toggle state from cdk.json plus the deployment state of gco-analytics.

gco analytics status

Example:

gco analytics status

gco analytics users add

Create a Cognito user in the analytics user pool. Calls cognito-idp:AdminCreateUser and prints the temporary password to stdout exactly once. Optionally sets a permanent password via cognito-idp:AdminSetUserPassword so the user can sign in without the NEW_PASSWORD_REQUIRED challenge on first login.

gco analytics users add [OPTIONS]

Options:

Option Description
--username Cognito username to create (required).
--email Email address for the new user.
--no-email Suppress the Cognito welcome email (MessageAction=SUPPRESS).
--password Set a permanent password on the new user (also read from $GCO_STUDIO_PASSWORD). Mutually exclusive with --generate-password.
--generate-password Generate a strong random password, set it permanent, and print it once. Mutually exclusive with --password.

Example:

gco analytics users add --username alice --email alice@example.com
gco analytics users add --username bob --email bob@example.com --no-email

# Set a permanent password so first-time login doesn't hit NEW_PASSWORD_REQUIRED
gco analytics users add --username carol --no-email --generate-password
GCO_STUDIO_PASSWORD='StrongP@ssw0rd!' gco analytics users add --username dave --no-email --password "$GCO_STUDIO_PASSWORD"

gco analytics users list

List Cognito users in the analytics user pool. Default output is a formatted table via the existing OutputFormatter.

gco analytics users list [OPTIONS]

Options:

Option Description
--as-json Emit JSON instead of a table.

Example:

gco analytics users list
gco analytics users list --as-json

gco analytics users remove

Delete a Cognito user from the analytics user pool. Does not delete the user's Studio user profile or EFS home folder — use aws sagemaker delete-user-profile for that.

gco analytics users remove [OPTIONS]

Options:

Option Description
--username Cognito username to remove (required).
--yes Skip the confirmation prompt.

Example:

gco analytics users remove --username alice
gco analytics users remove --username alice --yes

gco analytics users set-password

Change a Cognito user's password via AdminSetUserPassword. By default the new password is marked permanent so the user can sign in directly with gco analytics studio login without hitting the NEW_PASSWORD_REQUIRED challenge. Pass --temporary to require the user to choose their own password on first sign-in.

gco analytics users set-password [OPTIONS]

Options:

Option Description
--username Cognito username whose password to change (required).
--password New password (also read from $GCO_STUDIO_PASSWORD; prompted otherwise). Mutually exclusive with --generate-password.
--generate-password Generate a strong random password, set it, and print it once. Mutually exclusive with --password.
--temporary Set the password as temporary (Permanent=false). Default is permanent.
--yes, -y Skip the confirmation prompt.

Examples:

# Interactive — prompts twice for the new password
gco analytics users set-password --username alice

# Non-interactive via env var (won't leak into shell history)
GCO_STUDIO_PASSWORD='StrongP@ssw0rd!' \
  gco analytics users set-password --username alice --yes

# Generate and print a new password
gco analytics users set-password --username alice --generate-password --yes

# Force the user to reset on next login
gco analytics users set-password --username alice \
  --password 'Temp!Reset123$' --temporary --yes

gco analytics studio login

Sign in to SageMaker Studio via Cognito SRP and print a presigned Studio URL on its own line on stdout (pipe-friendly). The password, IdToken, and URL are never written to disk.

gco analytics studio login [OPTIONS]

Options:

Option Description
--username Cognito username (required).
--password Password. Defaults to prompt (click.prompt(..., hide_input=True)). Also read from $GCO_STUDIO_PASSWORD if set.
--api-url Override the API Gateway base URL (otherwise auto-discovered from CloudFormation).
--open Launch the default browser on the presigned URL after printing it.

Example:

# Interactive (prompts for password)
gco analytics studio login --username alice

# Non-interactive
export GCO_STUDIO_PASSWORD='...'
gco analytics studio login --username alice

# Open browser automatically
gco analytics studio login --username alice --open

# Custom API endpoint
gco analytics studio login \
  --username alice \
  --api-url https://abc123.execute-api.us-east-2.amazonaws.com

gco analytics doctor

Run pre-flight checks before gco stacks deploy gco-analytics. Each check prints / plus a short remediation line. Exits 1 on any failing check.

Checks performed:

  • cdk.json is present and parses as JSON
  • gco-global, gco-api-gateway, and every regional stack are CREATE_COMPLETE
  • The three /gco/cluster-shared-bucket/* SSM parameters are present in the global region
  • No orphaned retained analytics resources are left from a previous retain-policy destroy
gco analytics doctor

Example:

gco analytics doctor

Configuration

Config File

Create ~/.gco/config.yaml:

default_region: us-east-1
output_format: table
verbose: false
regions:
  - us-east-1
  - us-west-2
  - eu-west-1

cdk.json

Project configuration in cdk.json:

{
  "context": {
    "project_name": "gco",
    "deployment_regions": {
      "global": "us-east-2",
      "api_gateway": "us-east-2",
      "monitoring": "us-east-2",
      "regional": ["us-east-1", "us-west-2"]
    },
    "resource_thresholds": {
      "cpu_threshold": 60,
      "memory_threshold": 60,
      "gpu_threshold": -1,
      "pending_pods_threshold": 10,
      "pending_requested_cpu_vcpus": 100,
      "pending_requested_memory_gb": 200,
      "pending_requested_gpus": -1
    },
    "fsx_lustre": {
      "enabled": false,
      "storage_capacity_gib": 1200
    }
  }
}

Set any threshold to -1 to disable that health check. This is useful when running GPU inference endpoints that naturally saturate GPU resources.

Environment Variables

Variable Description
AWS_REGION Default AWS region
AWS_PROFILE AWS credentials profile
GCO_CONFIG Path to config file
GCO_REGIONAL_API Use regional API endpoints (true/false)
CDK_DOCKER Docker command (docker or finch)

Examples

Complete Workflow

# 1. Deploy (bootstrap runs automatically if needed)
export CDK_DOCKER=finch
gco stacks deploy-all -y

# 2. Check capacity
gco capacity status
gco capacity recommend-region --gpu

# 3. Submit jobs
gco jobs submit-sqs examples/simple-job.yaml --region us-east-1
gco jobs queue-status --all-regions

# 4. Monitor jobs
gco jobs list --all-regions
gco jobs logs my-job -r us-east-1 -n gco-jobs

# 5. Download outputs
gco files ls -r us-east-1
gco files download my-job/outputs ./results -r us-east-1

# 6. Cleanup
gco stacks destroy-all -y

Inference Endpoint Workflow

# 1. Upload model weights
gco models upload ./llama3-weights/ --name llama3-8b

# 2. Deploy inference endpoint
gco inference deploy my-llm \
  -i vllm/vllm-openai:v0.20.1 \
  --gpu-count 1 \
  --model-source $(gco models uri llama3-8b) \
  -e MODEL=/models/my-llm \
  -r us-east-1

# 3. Monitor deployment
gco inference status my-llm

# 4. Scale for production
gco inference scale my-llm --replicas 3

# Or enable autoscaling
gco inference deploy my-llm \
  -i vllm/vllm-openai:v0.20.1 \
  --replicas 2 --gpu-count 1 \
  --min-replicas 1 --max-replicas 8 \
  --autoscale-metric cpu:70

# 5. Rolling update
gco inference update-image my-llm -i vllm/vllm-openai:v0.20.1

# 6. Cleanup
gco inference delete my-llm -y
gco models delete llama3-8b -y

GPU Job Submission

# Check GPU capacity
gco capacity check -i g5.xlarge -r us-east-1

# Submit GPU job
gco jobs submit-sqs examples/gpu-job.yaml --auto-region

# Monitor
gco jobs list --all-regions
gco jobs logs gpu-test-job -r us-east-1 -n gco-jobs

Multi-Region Deployment

# Deploy to multiple regions
gco stacks deploy-all -y --parallel --max-workers 4

# Check status across regions
gco stacks list --all-regions
gco capacity status

Troubleshooting

Common Issues

"No credentials found"

# Ensure AWS credentials are configured
aws sts get-caller-identity

"Endpoint request timed out"

  • Wait 1-2 minutes after deployment for ALB targets to become healthy
  • Use submit-sqs or submit-direct instead of submit

"kubectl access denied"

  • Add your IAM principal to EKS access entries:
aws eks create-access-entry \
  --cluster-name gco-us-east-1 \
  --principal-arn arn:aws:iam::ACCOUNT:user/YOUR-USER \
  --region us-east-1

aws eks associate-access-policy \
  --cluster-name gco-us-east-1 \
  --principal-arn arn:aws:iam::ACCOUNT:user/YOUR-USER \
  --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \
  --access-scope type=cluster \
  --region us-east-1

"CDK bootstrap required"

This should resolve automatically — deploy and deploy-all auto-bootstrap un-bootstrapped regions. If it persists:

gco stacks bootstrap --region us-east-1

Debug Mode

# Enable verbose output
gco -v jobs list --all-regions

# Check AWS configuration
aws sts get-caller-identity
aws eks list-clusters --region us-east-1

For more help, see: