We don't want to spend much on a dedicated staging cluster.
Possible solution:
- Create a staging nodegroup in the prod cluster, with desiredSize=0, minSize=0, maxSize=2 on ec2 spot instances
- Staging workloads will have tolerations + node affinity so they only deploy to staging nodes
- Create ArgoCD ApplicationSet controller that will create a source when a new PR is made staging --> main
- Testing can be done on the staging env.
- When the PR is merged, the ApplicationSet will auto shut down the staging environment resources
- The nodegroup will have no more resources and will scale to 0
Considerations:
- Namespace isolation: the ApplicationSet should deploy to a
staging namespace in the cluster.
- Secrets will replicate prod sealed secrets? Issues with this?
- External-dns will provision domains on
staging.hotosm.org domain zone?
- Add a TTL (e.g. 7 days) on staging namespaces/apps using a cleanup job or ArgoCD ApplicationSet
pullRequest.requeueAfterSeconds?
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: staging-prs
namespace: argocd
spec:
generators:
- pullRequest:
github:
owner: hotosm
repo: field-tm
requeueAfterSeconds: 60 # refresh PR state every 60s
filters:
- branchMatch: "^main$" # only PRs targeting main
- branchMatchFrom: "^staging$" # only PRs coming from staging branch
template:
metadata:
name: "field-tm-staging" # argocd app name
spec:
project: default
source:
repoURL: https://github.com/hotosm/field-tm.git
targetRevision: "{{headSha}}" # deploy the PR’s commit
path: chart # path to helm chart
destination:
server: https://kubernetes.default.svc
namespace: "staging" # staging namespace - all apps same staging namespace, be wary of conflicts
syncPolicy:
automated:
prune: true
selfHeal: true
---
#A Kustomize patch so staging workloads tolerate + prefer staging nodes
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- target:
kind: Deployment
patch: |-
- op: add
path: /spec/template/spec/tolerations
value:
- key: "environment"
operator: "Equal"
value: "staging"
effect: "NoSchedule"
- op: add
path: /spec/template/spec/affinity
value:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "eks.amazonaws.com/nodegroup"
operator: In
values:
- staging-nodegroup
Example workflow
We don't want to spend much on a dedicated staging cluster.
Possible solution:
Considerations:
stagingnamespace in the cluster.staging.hotosm.orgdomain zone?pullRequest.requeueAfterSeconds?Example workflow
PR opened (staging → main)
{app-name}-staging.PR merged/closed