GitOps monorepo deploying 50+ Kubernetes components across multiple clusters via Kustomize and ArgoCD ApplicationSets.
| Action | Command |
|---|---|
| Build overlay | kustomize build --enable-helm components/<name>/<env>/ |
| Lint YAML | yamllint . |
| K8s lint | kube-linter lint <path> |
| Chainsaw tests | ./hack/chainsaw/chainsaw-prepare.sh and chainsaw test <path to .chainsaw-test folder> |
| infra-tools | cd infra-tools && make build test lint |
components/<name>/{base,development,staging,production}/— per-component Kustomize overlays; staging and production are often further split per-clusterargo-cd-apps/overlays/— maps to deployment targets (development, staging-downstream, production-downstream, etc.)configs/— cluster-level configurations (etcd-defrag, kubelet settings)hack/— deployment and utility scriptsinfra-tools/— Go CLI tools (env-detector, render-diff) with their own Makefile
- Prefer using scripts in
hack/over manual steps when available - Promotion order: development/staging → production; changes must be validated in dev/staging before promoting to production
- Production has per-cluster overlay directories; rollouts must be split into rings (subsets of clusters), not applied to all at once
- All changes via PR; CODEOWNERS approval required
- Production PRs must include
## Risk Assessment(level, description, rollback plan) and## Validation(staging evidence if applicable) - Commits - Jira ID at start (e.g.,
KFLUXINFRA-1234 description). Interactive sessions: Use the -s flag andAssisted-by:trailer. Agentic workflow:Authored-by:trailer. Include agent name and tool.
- E2E tests are designed to validate in an isolated environment in GitHub Actions CI and should not be run locally
- E2E tests are conditional — they only run on dev/staging PRs when specific files change. Production PRs do not run E2E; rely on prior dev/staging validation
- E2E tests frequently fail due to intermittent infrastructure issues. If the PR looks correct and E2E logs show no relevant errors, comment
/retestto re-trigger - When updating component images, also update image references in
hack/new-cluster/templates/as part of the production ring deployments — new clusters are bootstrapped from these and won't get ArgoCD-synced versions
- Before opening a PR, writing a PR description, or interpreting CI results, read
skills/pr-workflow.md - When a CI check fails on a PR, read
skills/ci-troubleshooting.md - When working interactively on new features or significant changes, read
skills/brainstorming-workflow.mdbefore making changes