Thank you for contributing to GCO (Global Capacity Orchestrator on AWS)! This guide will help you get started.
- Development Setup
- Development Workflow
- Code Organization
- Testing
- Documentation
- Code Review Guidelines
- Release Process
- Best Practices
- Common Tasks
- Getting Help
- Code of Conduct
Recommended path — the dev container only needs:
- AWS account with appropriate permissions
- Docker (or Finch / Colima) and Git
The container itself ships Python 3.14, Node.js 24, CDK, kubectl, AWS CLI, and every Python dependency at the exact versions CI uses, so you don't need any of them on your host.
Host development path additionally needs:
- Python 3.10+ (required for type union syntax
str | None) - Node.js 24+ (for CDK)
- kubectl
- A clean virtualenv (or pipx) for the GCO Python deps — see the warning under Local Development Environment (Advanced).
Strong recommendation: use the dev container. GCO pins many exact package versions (FastAPI, mypy, Ruff, AWS SDKs, CDK, etc.) so CI is reproducible. Installing those on top of an existing Python environment frequently triggers
ResolutionImpossible/ resolver errors. The dev container sidesteps this entirely and matches CI bit-for-bit.
The dev container includes all dependencies pre-installed (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI). This avoids "works on my machine" issues and is the supported path for everything from running tests to deploying stacks.
The image is multi-arch — Apple Silicon (linux/arm64), Intel/x86_64 hosts, and CI all build natively from the same Dockerfile.dev because every baked-in binary (kubectl, AWS CLI v2, Docker static client) is selected by $TARGETARCH. Native builds on Apple Silicon take ~2 min; emulated cross-builds (e.g. --platform linux/amd64 on an arm64 host) take ~7-8 min and are only needed when you specifically want to test the amd64 image.
# Build the container (cached on subsequent runs; ~2 min the first time)
docker build -f Dockerfile.dev -t gco-dev .
# Run an interactive shell
docker run -it --rm \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-w /workspace \
gco-dev
# Or run a single command
docker run --rm \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-w /workspace \
gco-dev gco stacks list
# Run CDK commands
docker run --rm \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-w /workspace \
-e CDK_DOCKER=docker \
gco-dev cdk synth
# Run tests
docker run --rm \
-v $(pwd):/workspace \
-w /workspace \
gco-dev pytest tests/ -vRunning gco stacks deploy-all from the container. cdk deploy
invokes Docker to bundle Lambda assets. The dev container ships only the
Docker CLI (no daemon), so mount the host Docker socket to give it a
transport to your host daemon:
docker run --rm -it \
-v ~/.aws:/root/.aws:ro \
-v $(pwd):/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-w /workspace \
gco-dev gco stacks deploy-all -yThis pattern works on Linux, Docker Desktop for macOS and Windows, and
Colima for macOS. On Colima the host socket lives at
~/.colima/default/docker.sock (older Colima) or ~/.colima/docker.sock
(newer Colima) — adjust the left side of the -v flag accordingly or
symlink the Colima socket to /var/run/docker.sock. See
https://github.com/abiosoft/colima for the current default. This is
host-socket pass-through, not true Docker-in-Docker — do not add
--privileged. The trade-off is that anyone inside the container has
root-equivalent access to the host Docker daemon through the mounted
socket, so only use this on trusted hosts.
Tip: Create a shell function for convenience. Using a function (rather than an alias that hardcodes $(pwd)) means it auto-resolves your GCO clone via git rev-parse, so gco stacks * and other source-tree-dependent commands work regardless of which subdirectory you call it from. Set GCO_HOME in your shell to use it from anywhere on disk:
gco-dev() {
local project_root="${GCO_HOME:-$(git rev-parse --show-toplevel 2>/dev/null)}"
# Check for both Dockerfile.dev *and* the gco/ namespace package
# so we don't accidentally bind-mount an unrelated repo that
# happens to have a Dockerfile.dev at its root.
if [[ -z "$project_root" \
|| ! -f "$project_root/Dockerfile.dev" \
|| ! -d "$project_root/gco" ]]; then
echo "gco-dev: not inside the GCO repo. cd into your clone, or set GCO_HOME." >&2
return 1
fi
docker run --rm \
-v ~/.aws:/root/.aws:ro \
-v "$project_root:/workspace" \
-w /workspace \
gco-dev "$@"
}
# Then use: gco-dev gco stacks listUse this path only if you specifically want to develop on your host (e.g., editor integrations like the Pyright/mypy LSP). It is not the supported path for one-off deploys or running tests — those should go through the dev container above.
# Clone repository
git clone <repository-url>
cd GCO
# Create a *fresh* virtual environment — do not reuse one that already has
# AWS CDK, FastAPI, mypy, or other commonly-pinned packages installed.
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e ".[dev]"If pip install fails with ResolutionImpossible or "the conflict is caused by..." messages, your venv is not actually clean (or you're on a Python version older than 3.10). Recreate the venv from scratch or switch to the dev container — please don't loosen the pins in pyproject.toml or requirements-lock.txt to make local install work, since CI will reject the lockfile drift.
GCO uses exact-pinned dependencies in pyproject.toml and a committed lockfile (requirements-lock.txt) for reproducible builds.
| Group | Install command | What it includes |
|---|---|---|
| Core | pip install -e . |
CLI runtime deps (boto3, click, requests, etc.) |
| CDK | pip install -e ".[cdk]" |
AWS CDK, cdk-nag, constructs (for stack synthesis) |
| Dev | pip install -e ".[dev]" |
Everything: CDK + lint + typecheck + test + security |
| MCP | pip install -e ".[mcp]" |
FastMCP server |
CDK dependencies are in a separate [cdk] extras group so operators who only use the CLI don't need to install the full CDK toolchain.
After updating any dependency version in pyproject.toml, regenerate the
lockfile using Dockerfile.dev. This is the only supported workflow — it
produces a deterministic, Linux-targeted lockfile that matches CI, avoids
host-specific path leakage, and doesn't require pip-tools on your machine.
# Build the dev image once (cached between runs, ~2 minutes the first time)
docker build -f Dockerfile.dev -t gco-dev .
# Regenerate the lockfile and strip the project self-reference
docker run --rm -v "$(pwd):/workspace" -w /workspace gco-dev bash -c '
pip-compile --no-emit-index-url --strip-extras --all-extras \
-o requirements-lock.txt pyproject.toml &&
sed -i "/^gco-cli @ file:/,+1d" requirements-lock.txt
'The sed step removes the gco-cli @ file:///workspace self-reference that
pip-compile always emits (two lines — the file:// URI and its # via
continuation). CI installs the project separately with pip install --no-deps,
and the staleness check strips ^gco-cli @ file anyway, but we keep it out of
the committed file for readability.
Running on Linux directly (native or WSL) matches the container's environment — macOS-only resolutions will produce a different lockfile that CI rejects, which is why the Docker path is the only supported workflow.
Commit the updated requirements-lock.txt alongside your pyproject.toml
changes. The lockfile pins all transitive dependencies to ensure reproducible
builds across environments.
For reproducible installs (CI, production containers):
pip install -r requirements-lock.txt
pip install -e . --no-depsmypy runs across the entire codebase with --check-untyped-defs enabled. The CI pipeline has two type-checking jobs:
lint:typecheck— Checksgco/config/,gco/models/,gco/services/, andcli/. Installs only mypy + type stubs (fast, no CDK needed).lint:typecheck-stacks— Checksgco/stacks/. Installs CDK dependencies since stack code uses CDK types.
To run locally:
# Check everything except stacks (fast, no CDK needed)
mypy gco/config/ gco/models/ gco/services/ cli/ --ignore-missing-imports --check-untyped-defs
# Check stacks (requires CDK: pip install -e ".[cdk,typecheck]")
mypy gco/stacks/ --ignore-missing-imports --check-untyped-defs
# Check everything at once (requires CDK installed)
mypy gco/ cli/ --ignore-missing-imports --check-untyped-defsThe in-cluster services use token-based authentication via AWS Secrets Manager. The auth middleware (gco/services/auth_middleware.py) validates an X-GCO-Auth-Token header on every request (except health checks).
Important: The middleware is fail-closed by default. If AUTH_SECRET_ARN is not set and GCO_DEV_MODE is not enabled, all authenticated requests return 503. To run services locally without Secrets Manager:
export GCO_DEV_MODE=trueThis is intentional — a missing AUTH_SECRET_ARN in production should fail loudly rather than silently allowing unauthenticated access.
git checkout -b feature/your-feature-nameFollow these guidelines:
- Code Style: Follow PEP 8 for Python, use type hints
- Documentation: Update docs for any user-facing changes
- Tests: Add tests for new functionality
- Commits: Use clear, descriptive commit messages
# Synthesize CDK
cdk synth
# Deploy to dev account
export AWS_PROFILE=dev
gco stacks deploy-all -y
# Run tests
pytest tests/
# Verify deployment
kubectl get pods -n gco-system
gco jobs list -r us-east-1# Commit changes
git add .
git commit -m "feat: add new feature"
# Push to remote
git push origin feature/your-feature-name
# Create pull request
# Follow your organization's PR processgco/
├── stacks/ # CDK stack definitions
├── services/ # Kubernetes services (Python/FastAPI)
├── models/ # Data models
└── config/ # Configuration management
cli/ # CLI commands and utilities
├── commands/ # Per-group command modules (jobs, capacity, stacks, …)
├── main.py # Root CLI group and entry point
├── kubectl_helpers.py # Shared kubeconfig utilities
lambda/
├── kubectl-applier-simple/ # Lambda for kubectl operations
├── helm-installer/ # Lambda for Helm chart installation
├── api-gateway-proxy/ # API Gateway proxy Lambda
├── ga-registration/ # Global Accelerator registration
├── secret-rotation/ # Secret rotation Lambda
└── alb-header-validator/ # ALB header validation
dockerfiles/ # Dockerfiles for K8s services
docs/ # Documentation
examples/ # Example job manifests
tests/ # Test suites
scripts/ # Utility scripts
- Create file in
gco/stacks/ - Import in
app.py - Add to deployment workflow
- Document in
docs/ARCHITECTURE.md
- Create service code in
gco/services/ - Create Dockerfile in
dockerfiles/ - Add manifest to
lambda/kubectl-applier-simple/manifests/ - Update
regional_stack.pyto build image - Document in README
- Update
cdk.jsoncontext - Test deployment
- Update documentation
- Verify Global Accelerator integration
# Run all tests
pytest tests/
# Run specific test file
pytest tests/test_integration.py
# Run with coverage
pytest --cov=gco --cov=cli tests/
# Run with verbose output
pytest tests/ -v
# Run only unit tests
pytest tests/ -m unit
# Run only integration tests
pytest tests/ -m integrationThe project uses GitHub Actions for automated testing. Every push and pull request runs four primary workflows in parallel, plus three satellites on schedule or manual trigger.
| Workflow file | README row | Purpose |
|---|---|---|
.github/workflows/unit-tests.yml |
Unit Tests | pytest with coverage, BATS, CLI smoke, CDK synth + config matrix, lockfile freshness, fresh install, workload import checks |
.github/workflows/integration-tests.yml |
Integration Tests | Per-Dockerfile build + healthcheck, kind cluster E2E (with Calico for NetworkPolicy enforcement), K8s manifest schema, Lambda import validation, cross-module pytest, MCP server pytest |
.github/workflows/security.yml |
Security | bandit, pip-audit, trivy (filesystem + per-image), trufflehog, gitleaks, semgrep, checkov, KICS |
.github/workflows/lint.yml |
Linting | actionlint, hadolint, markdownlint, mypy (strict/stacks/lambda), ruff (format + check, imports included), shellcheck, yamllint |
Each workflow file has a comment header documenting triggers and per-job purpose — that is the single source of truth. Every job uses category:tool:test_name display names (e.g., unit:pytest:core, security:trivy:container-scan) and category-tool-test_name job IDs.
| Workflow | Trigger | Purpose |
|---|---|---|
.github/workflows/release.yml |
Manual (workflow_dispatch) |
Bump version, tag, and create a GitHub Release with auto-generated notes |
.github/workflows/deps-scan.yml |
cron: 0 9 1 * * (monthly) |
Check Python/Docker/Helm/EKS-addon versions; open an issue if drift detected |
.github/workflows/cve-scan.yml |
cron: 0 9 * * 1 (weekly) |
Re-run Trivy against current CVE databases |
Three README badges update automatically from push: main runs:
unit:pytest:coretest countunit:bats:countunit:coveragepercentage
Values are published to the orphan badges branch as shields.io endpoint JSON and consumed via img.shields.io/endpoint?url=…. Fork PRs cannot write to this branch — the publish step is gated on push: main.
You can simulate the CI pipeline locally:
# Install dev dependencies
pip install -e ".[dev]"
# Run linters (matches lint.yml jobs)
ruff format --check gco/ cli/ mcp/ tests/ lambda/ scripts/ diagrams/
ruff check gco/ cli/ mcp/ tests/ lambda/ scripts/ diagrams/
yamllint --strict .
# Run markdownlint (requires Node; no Python install needed).
# Config lives in .markdownlint-cli2.yaml at the repo root.
npx markdownlint-cli2
# Run type checks (everything except stacks — fast, no CDK needed)
mypy gco/ cli/ mcp/ scripts/ --exclude 'gco/stacks/'
# Run type checks on stacks (requires CDK)
pip install -e ".[cdk,typecheck]"
mypy gco/stacks/ app.py
# Run security scans
bandit -r gco/ cli/ -c pyproject.toml --severity-level medium
# Run tests with coverage (matches unit:pytest:core)
pytest tests/ --cov=gco --cov=cli --cov-report=html --cov-fail-under=90 \
--ignore=tests/test_nag_compliance.py
# Run cdk-nag compliance matrix (matches unit:cdk:nag-compliance)
pytest tests/test_nag_compliance.py -n auto
# Run CDK config matrix (matches unit:cdk:config-matrix)
pytest tests/test_cdk_synthesis_matrix.py -n auto
# Regenerate the lockfile (after dependency changes — use the Docker workflow
# documented in Dependency Management above; pip-compile on the host produces
# a macOS-resolved lockfile that CI rejects)
docker run --rm -v "$(pwd):/workspace" -w /workspace gco-dev bash -c '
pip-compile --no-emit-index-url --strip-extras --all-extras \
-o requirements-lock.txt pyproject.toml &&
sed -i "/^gco-cli @ file:/,+1d" requirements-lock.txt
'The README badge label tells you the workflow and job. For example, unit:pytest:core maps to:
- Workflow file:
.github/workflows/unit-tests.yml - Job ID:
unit-pytest-core - Actions UI: repo → Actions → "Unit Tests" → latest run →
unit:pytest:core
Click any badge to land on the workflow page; the Actions UI lists every job.
.gitlab-ci.yml is kept as a frozen reference for anyone forking to GitLab. It is NOT maintained and may drift as tools evolve. GitHub Actions is authoritative.
# Deploy to test environment
export AWS_PROFILE=test
gco stacks deploy-all -y
# Run tests against deployed environment
pytest tests/ -v
# Clean up
gco stacks destroy-all -y- New features or capabilities
- Changes to deployment process
- New configuration options
- Breaking changes
- Bug fixes that affect users
README.md: Overview and quick startQUICKSTART.md: Step-by-step setup guidedocs/ARCHITECTURE.md: Technical architecturedocs/CLI.md: CLI referencedocs/API.md: REST API referencedocs/CONCEPTS.md: Core concepts for new usersdocs/CUSTOMIZATION.md: How to customizedocs/TROUBLESHOOTING.md: Common issuesdocs/RUNBOOKS.md: Operational runbooks for incident responseCONTRIBUTING.md: This file
- Use clear, concise language
- Include code examples
- Add diagrams where helpful — GCO has two auto-generated diagram
catalogues you can lean on or extend:
diagrams/infra_diagrams/— per-stack and full-architecture views synthesized from the CDK app via AWS PDK cdk-graph. Runpython diagrams/infra_diagrams/generate.pyto refresh.diagrams/code_diagrams/— per-function control-flow charts (Lambda handlers, CLI entry points) rendered with pyflowchart + Playwright. Runpython diagrams/code_diagrams/generate.pyto refresh; the script auto-inserts a# Flowchart:pointer comment at the top of every source file it charts so readers can navigate from code to diagram without guessing. Add new targets by editingdiagrams/code_diagrams/_targets.py.
- Keep it up-to-date with code changes
- Keep PRs focused and reasonably sized
- Write clear PR descriptions
- Include tests
- Update documentation
- Respond to feedback promptly
- Be constructive and respectful
- Focus on code quality and maintainability
- Check for security issues
- Verify documentation is updated
- Test changes if possible
We use semantic versioning (MAJOR.MINOR.PATCH):
- MAJOR: Breaking changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes
No long-lived tokens are required for the GitHub Actions pipeline. Both the release and dependency-scan workflows use the built-in GITHUB_TOKEN:
release.ymlneedscontents: writeto push the version commit, tag, and create the GitHub Release. The workflow declares this at the top of the file.deps-scan.ymlneedsissues: writeto open a dependency-drift issue. Also declared at the top of the file.
If you fork and run your own copy, no setup is needed — the tokens are generated per-run by GitHub.
Releases are triggered from the Actions tab:
- Go to the repository on GitHub → Actions → Release.
- Click "Run workflow".
- Pick the bump type (
patch,minor, ormajor) and click "Run workflow".
The workflow will:
- Run
scripts/bump_version.pywith the chosen bump type. - Commit the version change to
main(asgithub-actions[bot]). - Create and push a
v<new-version>git tag. - Create a GitHub Release with auto-generated notes (categorized per
.github/release.yml).
If you need to release manually:
# Bump version
python scripts/bump_version.py patch # or minor/major
# Commit and tag
git add VERSION gco/_version.py cli/__init__.py
git commit -m "Release v1.2.3"
git tag -a v1.2.3 -m "Release v1.2.3"
git push origin main
git push origin v1.2.3
# Create the GitHub Release with generated notes
gh release create v1.2.3 --generate-notesAfter releasing, update CHANGELOG.md and deploy to production environments.
Dependency drift is tracked through three layers:
- Dependabot (weekly PRs) — GitHub Actions and Docker only. See
.github/dependabot.yml. Python packages are intentionally excluded becauserequirements-lock.txtis managed throughpip-compileand bumped intentionally. deps-scanworkflow (monthly issue) — runs on the 1st of each month at 09:00 UTC. Checks Python packages, Docker images, Helm charts, EKS add-on versions, Aurora PostgreSQL engine versions, and pre-commit hook revisions. If anything is out of date, it opens a GitHub issue labeleddependencies, automated. The scan logic lives in.github/scripts/dependency-scan.sh— see.github/CI.mdfor the full reference (surfaces checked, inputs, outputs, extension points, failure modes). Pinned versions are centralised ingco/stacks/constants.py.cve-scanworkflow (weekly job) — runs Mondays at 09:00 UTC. Re-runs Trivy against the latest CVE databases. A red run is the signal; the per-pushsecurity.ymlworkflow will catch the same issue on the next PR.
- Python Packages: all packages resolved from
pyproject.tomlare checked against PyPI for newer versions - Docker Images: semver-tagged images referenced in
.github/workflows/*.yml, K8s manifests, examples, and Helm chart values - Helm Charts: from
lambda/helm-installer/charts.yaml - EKS Add-ons: extracted from
gco/stacks/regional_stack.py(requires AWS credentials via OIDC; falls back gracefully otherwise) - Pre-commit Hooks:
rev:pins in.pre-commit-config.yamlare compared against the latest tag published by the upstream GitHub repo
The monthly scan is also wired to workflow_dispatch:
- Go to Actions → "Deps scan" → "Run workflow".
- Pick the
mainbranch and click Run. - On completion, either a new issue appears (if drift was found) or the workflow just turns green.
EKS addon versions are checked by deps-scan when AWS credentials are configured. Without credentials, the addon section is skipped silently. To check manually at any time:
# Check latest versions for all addons used by GCO
K8S_VERSION="1.35" # Match your configured Kubernetes version
for addon in metrics-server aws-efs-csi-driver amazon-cloudwatch-observability aws-fsx-csi-driver; do
echo "=== $addon ==="
aws eks describe-addon-versions \
--addon-name "$addon" \
--kubernetes-version "$K8S_VERSION" \
--query 'addons[0].addonVersions[0].addonVersion' \
--output text
doneCurrent addon versions are defined in gco/stacks/regional_stack.py. To update:
- Run the command above to get latest versions
- Update the
addon_versionparameter for each addon inregional_stack.py - Test the deployment in a non-production environment first
- Review the EKS addon release notes for breaking changes
- Never commit secrets or credentials
- Use IAM roles, not access keys
- Follow least-privilege principle
- Encrypt sensitive data
- Review security groups and network ACLs
- Optimize Docker images (use slim base images)
- Set appropriate resource limits
- Use caching where possible
- Monitor and profile performance
- Use Spot instances for fault-tolerant workloads
- Right-size resources
- Clean up unused resources
- Set up cost alerts
- Implement health checks
- Use multiple replicas
- Test failure scenarios
- Monitor and alert on issues
# 1. Create manifest file
cat > lambda/kubectl-applier-simple/manifests/33-my-service.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-service
namespace: gco-system
spec:
replicas: 2
selector:
matchLabels:
app: my-service
template:
metadata:
labels:
app: my-service
spec:
containers:
- name: my-service
image: {{MY_SERVICE_IMAGE}}
ports:
- containerPort: 8080
EOF
# 2. Update CDK stack to build image (if needed)
# Edit gco/stacks/regional_stack.py
# 3. Deploy
gco stacks deploy-all -y# 1. Make changes to service
vim gco/services/health_monitor.py
# 2. Test locally (if possible)
python gco/services/health_monitor.py
# 3. Rebuild and deploy
gco stacks deploy-all -y
# 4. Verify deployment
kubectl get pods -n gco-system
gco jobs list -r us-east-1# Check CloudFormation events
aws cloudformation describe-stack-events \
--stack-name gco-us-east-1 \
--region us-east-1 \
--max-items 20
# Check Lambda logs
aws logs tail /aws/lambda/gco-us-east-1-KubectlApplier* \
--region us-east-1 \
--since 30m
# Check pod logs (requires kubectl for detailed pod inspection)
kubectl logs -n gco-system deployment/health-monitor --tail=100
# Describe pod for events (requires kubectl)
kubectl describe pod POD-NAME -n gco-system
# Check job logs via CLI
gco jobs logs JOB-NAME -n gco-jobs -r us-east-1- Check existing documentation
- Search for similar issues
- Open a GitHub issue
- Be respectful and professional
- Welcome newcomers
- Focus on constructive feedback
- Collaborate openly
Questions? Open an issue on the GCO GitHub repository.