Skip to content

Commit 764aa1c

Browse files
authored
docs+infra(dev-container): recommend dev container path; make Dockerf… (#49)
* docs+infra(dev-container): recommend dev container path; make Dockerfile.dev multi-arch The dev container is the supported install path because GCO pins many exact Python package versions and host installs frequently fail with dependency-resolver errors. Update the docs to reflect that, and fix two things that were undermining the recommendation. Documentation - README, QUICKSTART, CONTRIBUTING now lead with `docker build -f Dockerfile.dev` and demote pip/pipx to a collapsed "advanced" path with explicit guidance on the resolver-conflict failure mode. - New "Installation Issues" section in docs/TROUBLESHOOTING.md walks through `ResolutionImpossible` errors and points at the dev container as the canonical fix. - mcp/README.md and QUICKSTART.md now use absolute-path-in-args for the MCP config so the same snippet works in Cursor, Kiro, Claude Desktop, etc. without relying on client-specific `cwd` handling. Dockerfile.dev — multi-arch - Add ARG TARGETARCH and derive a GNU arch tag (x86_64 / aarch64) for the AWS CLI v2 and Docker static-client tarballs; pass TARGETARCH straight through for the kubectl URL. - Builds natively on linux/amd64 (CI on x86_64 GitHub runners, Intel Macs, Linux x86_64 hosts) and linux/arm64 (Apple Silicon, Graviton, arm64 Linux). Native arm64 build on Apple Silicon drops from ~7-8 min (qemu emulation under Finch/Colima) to ~2 min — matching the build-time claim in the docs. - All baked-in deps verified to install from wheels on arm64 (no sdist fallback for any pinned package). CI — integration:docker:dev-container - New "Verify amd64 image actually contains amd64 binaries" step: asserts uname -m and AWS CLI v2 banner agree on x86_64. - New QEMU + buildx setup followed by linux/arm64 cross-build, then triple-layer arch verification: uname -m == aarch64, AWS CLI banner contains aarch64, gco --version loads, and an inline Python ELF e_machine inspection of kubectl + the Docker static client to catch single-binary regressions that would slip past a run-time check under QEMU translation. - Bumped job timeout 25→40 min to absorb the cross-build (~10× slower under qemu). * updates * Update integration-tests.yml
1 parent 46e9ad9 commit 764aa1c

7 files changed

Lines changed: 434 additions & 99 deletions

File tree

.github/workflows/integration-tests.yml

Lines changed: 101 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# Triggers: push: main, pull_request, workflow_dispatch.
88
#
99
# Jobs (alphabetical by display name):
10-
# - integration:docker:dev-container — Dockerfile.dev + `gco --help`
10+
# - integration:docker:dev-container — Dockerfile.dev + `gco --help` + multi-arch (amd64 native, arm64 via QEMU)
1111
# - integration:docker:health-monitor — build + /healthz + /readyz
1212
# - integration:docker:helm-installer — build + `helm version`/`kubectl version`
1313
# - integration:docker:inference-monitor — build + import-based liveness
@@ -360,10 +360,17 @@ jobs:
360360
integration-docker-dev-container:
361361
name: "integration:docker:dev-container"
362362
runs-on: ubuntu-latest
363-
timeout-minutes: 25
363+
# Bumped from 25→40 to accommodate the linux/arm64 cross-build, which
364+
# runs every Dockerfile.dev RUN step under QEMU emulation on the
365+
# amd64 runner (vs. ~20 min for the native amd64 build on a standard
366+
# 2-vCPU runner — pip-install of all the pinned deps dominates).
367+
timeout-minutes: 40
364368
steps:
365369
- uses: actions/checkout@v6
366370
- name: Build dev container
371+
# Native amd64 build on the x86_64 GitHub runner — exercises the
372+
# default code path operators hit on Intel/Linux and Docker Desktop
373+
# / x86_64. The arm64 path is exercised in a later step.
367374
run: docker build -f Dockerfile.dev -t gco-dev .
368375
- name: Exercise CLI inside dev container
369376
run: |
@@ -383,6 +390,19 @@ jobs:
383390
aws --version
384391
docker --version
385392
'
393+
- name: Verify amd64 image actually contains amd64 binaries
394+
# The Dockerfile.dev is multi-arch via $TARGETARCH (see its header).
395+
# On this x86_64 runner the default build above should produce an
396+
# amd64 image — confirm the GNU-arch derivation picked the right
397+
# tarballs by checking what the binaries themselves report.
398+
run: |
399+
set -euo pipefail
400+
arch=$(docker run --rm gco-dev uname -m)
401+
[ "$arch" = "x86_64" ] || { echo "Expected x86_64, got: $arch"; exit 1; }
402+
# AWS CLI v2 prints its build identifier (e.g. "exe/x86_64.debian.13")
403+
# in --version output — the arch tag here proves we downloaded the
404+
# right tarball, not just that the binary happens to run.
405+
docker run --rm gco-dev aws --version | grep -q "x86_64\|amd64"
386406
- name: Verify Docker-in-Docker via host socket pass-through
387407
# `gco stacks deploy-all` drives `cdk deploy`, which shells out to
388408
# Docker to bundle Lambda assets. The dev container ships only the
@@ -421,6 +441,85 @@ jobs:
421441
# Clean up the probe image so this job leaves the runner tidy.
422442
docker image rm dind-probe:ci alpine:3.21 >/dev/null 2>&1 || true
423443
444+
# ----------------------------------------------------------------------
445+
# Multi-arch cross-build verification.
446+
#
447+
# Dockerfile.dev is parameterized on BuildKit's $TARGETARCH so the
448+
# image builds natively on both linux/amd64 (this runner, CI default)
449+
# and linux/arm64 (Apple Silicon, Graviton, arm64 Linux). The amd64
450+
# path is fully exercised above. The steps below cross-build the
451+
# arm64 image under QEMU emulation and confirm:
452+
#
453+
# 1. The arm64 build succeeds (catches Dockerfile regressions where
454+
# a tool URL gets pinned back to amd64 or the GNU-arch derivation
455+
# breaks).
456+
# 2. The resulting image actually contains aarch64 binaries (catches
457+
# "wrong tarball downloaded but binary happens to run under
458+
# emulation" type bugs).
459+
# 3. Python dependencies still install from wheels on arm64 — i.e.
460+
# no pinned package has silently become amd64-only since the
461+
# lockfile was regenerated.
462+
#
463+
# Cross-build under QEMU is meaningfully slower than the native
464+
# amd64 build above; if this step becomes a bottleneck, consider
465+
# gating it on `paths` for Dockerfile.dev and pyproject.toml only.
466+
# ----------------------------------------------------------------------
467+
- name: Set up QEMU (for linux/arm64 emulation)
468+
uses: docker/setup-qemu-action@v4
469+
with:
470+
platforms: arm64
471+
- name: Set up Buildx (enables --platform)
472+
uses: docker/setup-buildx-action@v4
473+
- name: Cross-build dev container for linux/arm64
474+
run: |
475+
docker buildx build \
476+
--platform linux/arm64 \
477+
--load \
478+
-f Dockerfile.dev \
479+
-t gco-dev:arm64 \
480+
.
481+
- name: Verify arm64 image contains native aarch64 binaries
482+
run: |
483+
set -euo pipefail
484+
# Under QEMU's binfmt translation, /bin/bash's `uname -m` reports
485+
# the binary's expected arch, not the host kernel's. A mismatched
486+
# /bin/bash here would mean the image was tagged arm64 but the
487+
# base layers actually pulled amd64.
488+
arch=$(docker run --rm gco-dev:arm64 uname -m)
489+
[ "$arch" = "aarch64" ] || { echo "Expected aarch64, got: $arch"; exit 1; }
490+
# AWS CLI v2 prints its build identifier (e.g. "exe/aarch64.debian.13")
491+
# in --version output. The arch tag here proves we downloaded the
492+
# awscli-exe-linux-aarch64-*.zip tarball, not the x86_64 one.
493+
docker run --rm gco-dev:arm64 aws --version | grep -q "aarch64"
494+
# `gco --version` actually loading proves no Python dependency fell
495+
# back to an amd64-only sdist (which would fail at install or
496+
# import time on an arm64 base).
497+
docker run --rm gco-dev:arm64 gco --version
498+
# Inspect the ELF header e_machine field for the kubectl and
499+
# Docker static client binaries. We can't rely on QEMU+binfmt to
500+
# surface arch mismatches by failing to run — Docker images can
501+
# contain mixed-arch binaries (e.g. an amd64 kubectl alongside
502+
# arm64 system binaries), and the kernel happily picks the right
503+
# QEMU translator per-binary. The only durable check is the ELF
504+
# header itself (e_machine = 0xB7 → AArch64; 0x3E → x86_64). We
505+
# use Python's stdlib because `file`, `readelf`, and `objdump`
506+
# aren't in the slim base image.
507+
# `-i` attaches stdin so the heredoc reaches `python3 -` inside
508+
# the container. Without it Docker drops stdin and the script
509+
# runs with empty input → silent no-op (no failure, no output).
510+
docker run --rm -i gco-dev:arm64 python3 - <<'PY'
511+
import struct, sys, pathlib
512+
EM_AARCH64 = 0xB7
513+
for binary in ("/usr/local/bin/kubectl", "/usr/local/bin/docker"):
514+
with pathlib.Path(binary).open("rb") as f:
515+
f.seek(0x12) # ELF e_machine offset
516+
e_machine = struct.unpack("<H", f.read(2))[0]
517+
if e_machine != EM_AARCH64:
518+
print(f"FAIL: {binary} e_machine=0x{e_machine:X} (expected 0x{EM_AARCH64:X} aarch64)", file=sys.stderr)
519+
sys.exit(1)
520+
print(f"OK: {binary} is aarch64 (e_machine=0x{e_machine:X})")
521+
PY
522+
424523
# --------------------------------------------------------------------------
425524
# Kind cluster E2E — verifies the Kubernetes manifests apply cleanly to a
426525
# real API server with Calico installed so NetworkPolicy is enforced (not

CONTRIBUTING.md

Lines changed: 37 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ Thank you for contributing to GCO (Global Capacity Orchestrator on AWS)! This gu
66

77
- [Development Setup](#development-setup)
88
- [Prerequisites](#prerequisites)
9-
- [Local Development Environment](#local-development-environment)
9+
- [Using the Dev Container (Recommended)](#using-the-dev-container-recommended)
10+
- [Local Development Environment (Advanced)](#local-development-environment-advanced)
1011
- [Development Workflow](#development-workflow)
1112
- [Dependency Management](#dependency-management)
1213
- [Type Checking](#type-checking)
@@ -30,36 +31,30 @@ Thank you for contributing to GCO (Global Capacity Orchestrator on AWS)! This gu
3031

3132
### Prerequisites
3233

33-
- AWS account with appropriate permissions
34-
- Python 3.10+ (required for type union syntax `str | None`)
35-
- Node.js 24+ (for CDK)
36-
- Docker or Finch
37-
- kubectl
38-
- Git
34+
**Recommended path — the dev container only needs:**
3935

40-
**Alternative: Use the Dev Container** (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI) to avoid local dependency issues (see below).
36+
- AWS account with appropriate permissions
37+
- Docker (or Finch / Colima) and Git
4138

42-
### Local Development Environment
39+
The container itself ships Python 3.14, Node.js 24, CDK, kubectl, AWS CLI, and every Python dependency at the exact versions CI uses, so you don't need any of them on your host.
4340

44-
```bash
45-
# Clone repository
46-
git clone <repository-url>
47-
cd GCO
41+
**Host development path additionally needs:**
4842

49-
# Create virtual environment
50-
python3 -m venv .venv
51-
source .venv/bin/activate # On Windows: .venv\Scripts\activate
43+
- Python 3.10+ (required for type union syntax `str | None`)
44+
- Node.js 24+ (for CDK)
45+
- kubectl
46+
- A clean virtualenv (or pipx) for the GCO Python deps — see the warning under [Local Development Environment (Advanced)](#local-development-environment-advanced).
5247

53-
# Install dependencies
54-
pip install -e ".[dev]"
55-
```
48+
> **Strong recommendation:** use the dev container. GCO pins many exact package versions (FastAPI, mypy, Ruff, AWS SDKs, CDK, etc.) so CI is reproducible. Installing those on top of an existing Python environment frequently triggers `ResolutionImpossible` / resolver errors. The dev container sidesteps this entirely and matches CI bit-for-bit.
5649
5750
### Using the Dev Container (Recommended)
5851

59-
The dev container includes all dependencies pre-installed (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI). This avoids "works on my machine" issues.
52+
The dev container includes all dependencies pre-installed (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI). This avoids "works on my machine" issues and is the supported path for everything from running tests to deploying stacks.
53+
54+
The image is **multi-arch** — Apple Silicon (`linux/arm64`), Intel/x86_64 hosts, and CI all build natively from the same `Dockerfile.dev` because every baked-in binary (kubectl, AWS CLI v2, Docker static client) is selected by `$TARGETARCH`. Native builds on Apple Silicon take ~2 min; emulated cross-builds (e.g. `--platform linux/amd64` on an arm64 host) take ~7-8 min and are only needed when you specifically want to test the amd64 image.
6055

6156
```bash
62-
# Build the container
57+
# Build the container (cached on subsequent runs; ~2 min the first time)
6358
docker build -f Dockerfile.dev -t gco-dev .
6459

6560
# Run an interactive shell
@@ -123,6 +118,26 @@ alias gco-dev='docker run --rm -v ~/.aws:/root/.aws:ro -v $(pwd):/workspace -w /
123118
# Then use: gco-dev gco stacks list
124119
```
125120

121+
### Local Development Environment (Advanced)
122+
123+
Use this path only if you specifically want to develop on your host (e.g., editor integrations like the Pyright/mypy LSP). It is not the supported path for one-off deploys or running tests — those should go through the dev container above.
124+
125+
```bash
126+
# Clone repository
127+
git clone <repository-url>
128+
cd GCO
129+
130+
# Create a *fresh* virtual environment — do not reuse one that already has
131+
# AWS CDK, FastAPI, mypy, or other commonly-pinned packages installed.
132+
python3 -m venv .venv
133+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
134+
135+
# Install dependencies
136+
pip install -e ".[dev]"
137+
```
138+
139+
If `pip install` fails with `ResolutionImpossible` or "the conflict is caused by..." messages, your venv is not actually clean (or you're on a Python version older than 3.10). Recreate the venv from scratch or switch to the dev container — please don't loosen the pins in `pyproject.toml` or `requirements-lock.txt` to make local install work, since CI will reject the lockfile drift.
140+
126141
## Development Workflow
127142

128143
### Dependency Management
@@ -148,7 +163,7 @@ produces a deterministic, Linux-targeted lockfile that matches CI, avoids
148163
host-specific path leakage, and doesn't require `pip-tools` on your machine.
149164

150165
```bash
151-
# Build the dev image once (cached between runs, ~5 minutes the first time)
166+
# Build the dev image once (cached between runs, ~2 minutes the first time)
152167
docker build -f Dockerfile.dev -t gco-dev .
153168

154169
# Regenerate the lockfile and strip the project self-reference

Dockerfile.dev

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,18 @@
11
# GCO Development Container
22
# Use this container to run CDK and CLI commands with all dependencies pre-installed
33
#
4+
# Multi-arch: builds natively on linux/amd64 (CI on x86_64 GitHub runners,
5+
# Docker Desktop on Intel Macs / Linux) and linux/arm64 (Apple Silicon
6+
# Macs running Docker Desktop, Finch, Colima, or any arm64 Linux host).
7+
# kubectl, the AWS CLI v2, and the Docker static client all publish both
8+
# arch builds; we pick the right one via BuildKit's $TARGETARCH.
9+
#
410
# Build:
511
# docker build -f Dockerfile.dev -t gco-dev .
612
#
13+
# To cross-build, pass --platform explicitly:
14+
# docker build --platform linux/amd64 -f Dockerfile.dev -t gco-dev .
15+
#
716
# Run (interactive):
817
# docker run -it --rm \
918
# -v ~/.aws:/root/.aws:ro \
@@ -57,6 +66,21 @@
5766

5867
FROM python:3.14.4-slim
5968

69+
# Multi-arch build args. BuildKit auto-populates TARGETARCH with the build
70+
# platform's Go-style arch name (`amd64` / `arm64`). We additionally derive
71+
# the GNU/Linux arch tag (`x86_64` / `aarch64`) for tools whose download
72+
# URLs use that convention (AWS CLI v2, Docker static binaries).
73+
#
74+
# Building natively for the host arch is the default (`finch build .` on
75+
# Apple Silicon → arm64; CI on x86_64 runners → amd64). To cross-build,
76+
# pass `--platform linux/amd64` or `--platform linux/arm64`.
77+
ARG TARGETARCH
78+
RUN case "${TARGETARCH}" in \
79+
amd64) echo "x86_64" > /tmp/gnu_arch ;; \
80+
arm64) echo "aarch64" > /tmp/gnu_arch ;; \
81+
*) echo "Unsupported TARGETARCH: ${TARGETARCH}" >&2; exit 1 ;; \
82+
esac
83+
6084
# Install system dependencies
6185
RUN apt-get update && apt-get install -y --no-install-recommends \
6286
curl \
@@ -89,17 +113,22 @@ RUN npm install -g "aws-cdk@${CDK_VERSION}" \
89113
# Install kubectl — pinned to the EKS cluster's minor line (see
90114
# ``cdk.json::kubernetes_version``). The kubectl + apiserver skew policy
91115
# allows ±1 minor, so this pin can lag or lead the cluster by one.
116+
# kubectl publishes both linux/amd64 and linux/arm64 binaries, so we
117+
# pass ${TARGETARCH} straight through.
92118
ARG KUBECTL_VERSION=v1.35.4
93-
RUN curl -LO "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl" \
119+
ARG TARGETARCH
120+
RUN curl -LO "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/${TARGETARCH}/kubectl" \
94121
&& chmod +x kubectl \
95122
&& mv kubectl /usr/local/bin/ \
96123
&& kubectl version --client=true
97124

98125
# Install AWS CLI v2 — pinned to a specific release so reproducibility is
99126
# preserved across rebuilds. Bump when a new feature or bugfix is needed
100127
# (see https://github.com/aws/aws-cli/blob/v2/CHANGELOG.rst).
128+
# AWS CLI v2 uses GNU arch tags (x86_64 / aarch64) in the install URL.
101129
ARG AWSCLI_VERSION=2.34.42
102-
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip" -o "awscliv2.zip" \
130+
RUN GNU_ARCH=$(cat /tmp/gnu_arch) \
131+
&& curl "https://awscli.amazonaws.com/awscli-exe-linux-${GNU_ARCH}-${AWSCLI_VERSION}.zip" -o "awscliv2.zip" \
103132
&& unzip awscliv2.zip \
104133
&& ./aws/install \
105134
&& rm -rf awscliv2.zip aws \
@@ -111,14 +140,18 @@ RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}
111140
# container talks to the host Docker daemon through the bind-mounted
112141
# socket documented in the header, so only the client binary is needed
113142
# here. Pinned to a specific version to keep the image reproducible —
114-
# bump intentionally when testing against a new Docker release.
115-
# Latest stable at pin time (see https://download.docker.com/linux/static/stable/x86_64/):
143+
# bump intentionally when testing against a new Docker release. Docker
144+
# static binaries also use GNU arch tags (x86_64 / aarch64) in the URL.
145+
# Latest stable at pin time:
146+
# https://download.docker.com/linux/static/stable/x86_64/
147+
# https://download.docker.com/linux/static/stable/aarch64/
116148
ARG DOCKER_VERSION=29.4.2
117-
RUN curl -fsSL "https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_VERSION}.tgz" \
149+
RUN GNU_ARCH=$(cat /tmp/gnu_arch) \
150+
&& curl -fsSL "https://download.docker.com/linux/static/stable/${GNU_ARCH}/docker-${DOCKER_VERSION}.tgz" \
118151
-o /tmp/docker.tgz \
119152
&& tar -xzf /tmp/docker.tgz -C /tmp \
120153
&& mv /tmp/docker/docker /usr/local/bin/ \
121-
&& rm -rf /tmp/docker /tmp/docker.tgz \
154+
&& rm -rf /tmp/docker /tmp/docker.tgz /tmp/gnu_arch \
122155
&& docker --version
123156

124157
# Set working directory

0 commit comments

Comments
 (0)