Skip to content

Features to allow async drift / budgets (eg for Azure) #1861

Open
@charliedmcb

Description

@charliedmcb

Description

Note:

This is a known BUG, with a known cause. It also applies to other forms of Disruption which also provision replacement nodes, beyond just Drift. It requires both changes in kubernetes-sigs/karpenter, along with changes in the Azure/karpenter-provider-azure repo, so opening a BUG in both (tracking Azure/karpenter-provider-azure side here 600)
 
Observed Behavior:
Karpenter is only provisioning and replacing 1 Drifted node at a time, even with a Budget of "5".

Had all 10 nodeclaims marked as Drifted:

kubectl get nodeclaims -o go-template='{{range $item := .items}}{{with $nodeclaimname := $item.metadata.name}}{{range $condition := $item.status.conditions}}{{if and (eq $condition.type "Drifted") (eq $condition.status "True")}}{{printf "%s\n" $nodeclaimname}}{{end}}{{end}}{{end}}{{end}}'
general-purpose-4d8rn
general-purpose-brt2r
general-purpose-l9pv6
general-purpose-n6xv7
general-purpose-s9rnn
general-purpose-v76nm
general-purpose-vrnbl
general-purpose-vt4mv
general-purpose-wkk62
general-purpose-x6b47

Even after over a minute, there was only one new nodeclaim being created:

kubectl get karpenter
NAME                                       AGE
aksnodeclass.karpenter.azure.com/default   99m

NAME                                           TYPE               CAPACITY    ZONE        NODE                        READY     AGE
nodeclaim.karpenter.sh/general-purpose-4d8rn   Standard_D2ls_v5   on-demand   westus2-3   aks-general-purpose-4d8rn   True      78m
nodeclaim.karpenter.sh/general-purpose-brt2r   Standard_D2ls_v5   on-demand   westus2-1   aks-general-purpose-brt2r   True      70m
nodeclaim.karpenter.sh/general-purpose-l9pv6   Standard_D2ls_v5   on-demand   westus2-3   aks-general-purpose-l9pv6   True      52m
nodeclaim.karpenter.sh/general-purpose-n6xv7   Standard_D2as_v5   on-demand   westus2-1   aks-general-purpose-n6xv7   True      85m
nodeclaim.karpenter.sh/general-purpose-s9rnn   Standard_D2as_v5   on-demand   westus2-2   aks-general-purpose-s9rnn   True      48m
nodeclaim.karpenter.sh/general-purpose-v76nm   Standard_D2ls_v5   on-demand   westus2-1   aks-general-purpose-v76nm   True      68m
nodeclaim.karpenter.sh/general-purpose-vrnbl   Standard_D2as_v5   on-demand   westus2-3   aks-general-purpose-vrnbl   True      82m
nodeclaim.karpenter.sh/general-purpose-vt4mv   Standard_D2ls_v5   on-demand   westus2-3   aks-general-purpose-vt4mv   True      76m
nodeclaim.karpenter.sh/general-purpose-wkk62   Standard_D2ls_v5   on-demand   westus2-1   aks-general-purpose-wkk62   True      67m
nodeclaim.karpenter.sh/general-purpose-x6b47   Standard_D2ls_v5   on-demand   westus2-2   aks-general-purpose-x6b47   True      50m
nodeclaim.karpenter.sh/general-purpose-xf2ss                                                                          Unknown   82s

NAME                                    NODECLASS   NODES   READY   AGE
nodepool.karpenter.sh/general-purpose   default     10      True    98m

And only one node tainted as disrupted:

kubectl get nodes -o go-template='{{range $item := .items}}{{with $nodename := $item.metadata.name}}{{range $taint := $item.spec.taints}}{{if and (eq $taint.key "karpenter.sh/disrupted") (eq $taint.effect "NoSchedule")}}{{printf "%s\n" $nodename}}{{end}}{{end}}{{end}}{{end}}'
aks-general-purpose-4d8rn

Looking at the logs, we can see that the cluster is stalled on waiting on cluster sync:

{
  "level": "DEBUG",
  "time": "2024-12-03T00:20:35.932Z",
  "logger": "controller",
  "caller": "singleton/controller.go:26",
  "message": "waiting on cluster sync",
  "commit": "f1b33a7",
  "controller": "disruption",
  "namespace": "",
  "name": "",
  "reconcileID": "c11d69f3-97eb-4677-8c48-54bccc32a1bf"
}

This is due to the patterning that exists between kubernetes-sigs/karpenter, and Azure/karpenter-provider-azure, stalling the NodeClaim creation at the CloudProvider Create call:

created, err := l.cloudProvider.Create(ctx, nodeClaim)

Internally, the Azure provider has a few LRO (long running operations) which are part of this creation call, which has polling preformed on the call to Azure to ensure the resources have been created correctly, reporting back any issue otherwise. If this Polling is skipped, there would be an issue of certain types of errors on node creation being missed, and thus certain errors would have to wait for the registrationTTL of 15 min, which is unacceptable in these cases.

Expected Behavior:
Karpenter to provision X nodes equal to the Budget for Drift (in this case 5) asynchronously, to replace the Drifted nodes in accordance with the Budget.

Reproduction Steps (Please include YAML):

Deployed a dev version of Azure Karpenter, from the HEAD of main, using make az-all:
- de7dee7

ran make az-taintsystemnodes to ensure workload pods will only land on karpenter nodes.

Updated, and applied: examples/v1/general-purpose.yaml with karpenter.azure.com/sku-cpu to less than 3, budgets.nodes to "5", and consolidateAfter to Never:

# This example NodePool will provision general purpose instances
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
  annotations:
    kubernetes.io/description: "General purpose NodePool"
spec:
  disruption:
    consolidateAfter: Never
    budgets:
    - nodes: "5"
  # Optional: Uncomment if you want to put a cap on the max resources available for provisioning
  # limits:
  #   cpu: "30"
  template:
    metadata:
      labels:
        # required for Karpenter to predict overhead from cilium DaemonSet
        kubernetes.azure.com/ebpf-dataplane: cilium
    spec:
      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: default
      startupTaints:
      # https://karpenter.sh/docs/concepts/nodepools/#cilium-startup-taint
      - key: node.cilium.io/agent-not-ready
        effect: NoExecute
        value: "true"
      expireAfter: Never
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]
      - key: kubernetes.io/os
        operator: In
        values: ["linux"]
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["on-demand"]
      - key: karpenter.azure.com/sku-family
        operator: In
        values: [D]
      # Optional: Uncomment if you want to add a restriction on max sku cpus.
      #           Useful for ensuring karpneter provisions multiple nodes for feature testing.
      - key: karpenter.azure.com/sku-cpu
        operator: Lt
        values: ["3"]
---
apiVersion: karpenter.azure.com/v1alpha2
kind: AKSNodeClass
metadata:
  name: default
  annotations:
    kubernetes.io/description: "General purpose AKSNodeClass for running Ubuntu2204 nodes"
spec:
  imageFamily: Ubuntu2204

Updated, and applied: examples/workloads/inflate.yaml with replicas to 10, and cpu requests to 1.1:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 10
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
      - image: mcr.microsoft.com/oss/kubernetes/pause:3.6
        name: inflate
        resources:
          requests:
            cpu: "1.1"
            memory: 256M

Update imageFamily: AzureLinux in the AKSNodeClass, to trigger a Drift.

Versions:

  • Chart Version: Using Dev Build of Azure provider on HEAD of main with commit d84e7c8
    • sigs.k8s.io/karpenter: v1.0.5
  • Kubernetes Version (kubectl version): v1.29.9
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/provider/azureIssues or PRs related to azure providerkind/featureCategorizes issue or PR as related to a new feature.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions