Skip to content

Cannot disrupt NodeClaim: nodeclaim does not have an associated node #7631

Open
@IgorKurylo1988

Description

@IgorKurylo1988

Description

Observed Behavior:
Instance with gpu taints not started and the node not connected to the cluster
We have AMI GPU based on amazon-eks-gpu-node-1.30-*
That new install of the karpanter, we have other cluster with v0.32+ karpanter and there gpu works.

Expected Behavior:
Instance connected to cluster

Reproduction Steps (Please include YAML):

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 24h
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu
      expireAfter: Never
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - g5
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - 2xlarge
            - 4xlarge
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: karpenter.k8s.aws/instance-gpu-manufacturer
          operator: In
          values: ["nvidia"]
      taints:
        - key: nvidia.com/gpu
          effect: "NoSchedule"
          value: "true"
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2 # Amazon Linux 2
  instanceProfile: "sfly-aws-apc-dev-svc-eks-node-group-InstanceProfile"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery/standard-app-dev-common: "standard-app-dev-common"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery/standard-app-dev-common: "standard-app-dev-common"
  amiSelectorTerms:
    - id: "ami-080bac37fb480fa75" - GPU AMI Based on  amazon-eks-gpu-node-1.30-v20250116
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 10000
        deleteOnTermination: true
        throughput: 125
  tags:
    Name: standard-app-dev-common-eks-gpu
    Environment: "dev"
    Provisioner: Karpenter
    ManagedBy: APC
    BusinessUnit: Consumer
    App: EKS
    Role: GPU Compute Node

Versions:

  • Chart Version: v1.1.1
  • Kubernetes Version (kubectl version): 1.30 - EKS AWS
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage/needs-informationMarks that the issue still needs more information to properly triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions