Skip to content

Use ArgoCD ApplicationSet to deploy staging environments on demand via PRs #32

@spwoodcock

Description

@spwoodcock

We don't want to spend much on a dedicated staging cluster.

Possible solution:

  1. Create a staging nodegroup in the prod cluster, with desiredSize=0, minSize=0, maxSize=2 on ec2 spot instances
  2. Staging workloads will have tolerations + node affinity so they only deploy to staging nodes
  3. Create ArgoCD ApplicationSet controller that will create a source when a new PR is made staging --> main
  4. Testing can be done on the staging env.
  5. When the PR is merged, the ApplicationSet will auto shut down the staging environment resources
  6. The nodegroup will have no more resources and will scale to 0

Considerations:

  • Namespace isolation: the ApplicationSet should deploy to a staging namespace in the cluster.
  • Secrets will replicate prod sealed secrets? Issues with this?
  • External-dns will provision domains on staging.hotosm.org domain zone?
  • Add a TTL (e.g. 7 days) on staging namespaces/apps using a cleanup job or ArgoCD ApplicationSet pullRequest.requeueAfterSeconds?
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: staging-prs
  namespace: argocd
spec:
  generators:
    - pullRequest:
        github:
          owner: hotosm
          repo: field-tm
        requeueAfterSeconds: 60       # refresh PR state every 60s
        filters:
          - branchMatch: "^main$"     # only PRs targeting main
          - branchMatchFrom: "^staging$"  # only PRs coming from staging branch
  template:
    metadata:
      name: "field-tm-staging" # argocd app name
    spec:
      project: default
      source:
        repoURL: https://github.com/hotosm/field-tm.git
        targetRevision: "{{headSha}}" # deploy the PR’s commit
        path: chart # path to helm chart
      destination:
        server: https://kubernetes.default.svc
        namespace: "staging" # staging namespace - all apps same staging namespace, be wary of conflicts
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
---
#A Kustomize patch so staging workloads tolerate + prefer staging nodes
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
patches:
  - target:
      kind: Deployment
    patch: |-
      - op: add
        path: /spec/template/spec/tolerations
        value:
          - key: "environment"
            operator: "Equal"
            value: "staging"
            effect: "NoSchedule"
      - op: add
        path: /spec/template/spec/affinity
        value:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: "eks.amazonaws.com/nodegroup"
                      operator: In
                      values:
                        - staging-nodegroup

Example workflow

  • PR opened (staging → main)

    • ApplicationSet sees it, creates Application named {app-name}-staging.
    • Namespace staging is created if needed.
    • ArgoCD syncs --> workloads deployed --> pods pending --> cluster-autoscaler spins up staging nodegroup.
  • PR merged/closed

    • ApplicationSet deletes the Application.
    • ArgoCD prunes namespace + workloads.
    • No more pods --> autoscaler scales nodegroup back to 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    actionableTasks that can be actionedhelp wantedExtra attention is neededquestionA question for external input and discussionrepo:k8s-infra

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions