Open
Description
Description
Observed Behavior:
Instance with gpu taints not started and the node not connected to the cluster
We have AMI GPU based on amazon-eks-gpu-node-1.30-*
That new install of the karpanter, we have other cluster with v0.32+ karpanter and there gpu works.
Expected Behavior:
Instance connected to cluster
Reproduction Steps (Please include YAML):
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
spec:
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 24h
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: gpu
expireAfter: Never
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- g5
- key: karpenter.k8s.aws/instance-size
operator: In
values:
- 2xlarge
- 4xlarge
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: karpenter.k8s.aws/instance-gpu-manufacturer
operator: In
values: ["nvidia"]
taints:
- key: nvidia.com/gpu
effect: "NoSchedule"
value: "true"
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: gpu
spec:
amiFamily: AL2 # Amazon Linux 2
instanceProfile: "sfly-aws-apc-dev-svc-eks-node-group-InstanceProfile"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery/standard-app-dev-common: "standard-app-dev-common"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery/standard-app-dev-common: "standard-app-dev-common"
amiSelectorTerms:
- id: "ami-080bac37fb480fa75" - GPU AMI Based on amazon-eks-gpu-node-1.30-v20250116
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 10000
deleteOnTermination: true
throughput: 125
tags:
Name: standard-app-dev-common-eks-gpu
Environment: "dev"
Provisioner: Karpenter
ManagedBy: APC
BusinessUnit: Consumer
App: EKS
Role: GPU Compute Node
Versions:
- Chart Version: v1.1.1
- Kubernetes Version (
kubectl version
): 1.30 - EKS AWS
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Activity