Description
Description
Observed Behavior:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-dc9hb 2/2 Running 0 109m
aws-node-pzbww 2/2 Running 0 109m
coredns-789f8477df-8r5zd 1/1 Running 0 114m
coredns-789f8477df-tc5pt 1/1 Running 0 114m
eks-pod-identity-agent-gqwrz 1/1 Running 0 109m
eks-pod-identity-agent-sbng9 1/1 Running 0 109m
karpenter-df9d8f6dd-xbz9d 0/1 Running 0 118s
karpenter-df9d8f6dd-znnjw 0/1 Pending 0 118s
kube-proxy-l8bcp 1/1 Running 0 109m
kube-proxy-mnw6n 1/1 Running 0 109m
kubectl describe pod karpenter-df9d8f6dd-xbz9d -n kube-system
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
kube-api-access-n9sbj:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Scheduled 3m15s default-scheduler Successfully assigned kube-system/karpenter-df9d8f6dd-xbz9d to ip-10-110-164-199.ec2.internal
Normal Pulled 75s (x2 over 3m15s) kubelet Container image "public.ecr.aws/karpenter/controller:1.0.5@sha256:f2df98735b232b143d37f0c6819a6cae2be4740e3c8b38297bceb365cf3f668b" already present on machine
Normal Created 75s (x2 over 3m15s) kubelet Created container controller
Normal Killing 75s kubelet Container controller failed liveness probe, will be restarted
Warning Unhealthy 75s kubelet Readiness probe failed: Get "http://10.xxx.1x5.153:8081/readyz": read tcp 10.xxx.1x4.1x9:33238->10.xxx.1x5.1x3:8081: read: connection reset by peer
Warning Unhealthy 75s (x2 over 75s) kubelet Readiness probe failed: Get "http://10.xxx.1x5.153:8081/readyz": dial tcp 10.xxx.1x5.153:8081: connect: connection refused
Normal Started 74s (x2 over 3m14s) kubelet Started container controller
Warning Unhealthy 5s (x5 over 2m35s) kubelet Readiness probe failed: Get "http://10.xxx.1x5.153:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 5s (x4 over 2m15s) kubelet Liveness probe failed: Get "http://10.xxx.1x5.153:8081/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Expected Behavior:
karpenter pod must get to Running stage
Reproduction Steps (Please include YAML):
EKS cluster version 1.31 created using
followed both https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/
and https://karpenter.sh/docs/getting-started/migrating-from-cas/ but nothing worked
eksctl create cluster -f - <<EOF
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: xxxxx
region: us-east-1
version: "1.31"
tags:
karpenter.sh/discovery: xxxxx
privateCluster:
enabled: true
skipEndpointCreation: true
iam:
withOIDC: true
podIdentityAssociations:
- namespace: "kube-system"
serviceAccountName: karpenter
roleName: xxxx-karpenter
permissionPolicyARNs:- arn:aws:iam::xxxxxxxx:policy/KarpenterControllerPolicy-xxxx
iamIdentityMappings:
- arn: "arn:aws:iam::xxxxxxx:role/KarpenterNodeRole-xxxx"
username: system:node:{{EC2PrivateDNSName}}
groups:- system:bootstrappers
- system:nodes
managedNodeGroups:
- instanceType: m5d.large
amiFamily: AmazonLinux2
name: xxxxx-ng
desiredCapacity: 2
minSize: 1
maxSize: 10
privateNetworking: true
addons:
- name: eks-pod-identity-agent
- name: coredns
- name: vpc-cni
- name: kube-proxy
EOF
KARPENTER_VERSION = 1.0.5 (tried 1.0.6 as well but didn't work)
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace
--set "settings.clusterName=${CLUSTER_NAME}"
--set "settings.interruptionQueue=${CLUSTER_NAME}"
--set "settings.isolatedVPC=true"
--set controller.resources.requests.cpu=1
--set controller.resources.requests.memory=1Gi
--set controller.resources.limits.cpu=1
--set controller.resources.limits.memory=1Gi
--wait
tried dnsPolicy=Default but didn't work
kubectl logs karpenter-df9d8f6dd-xbz9d -n kube-system
{"level":"DEBUG","time":"2024-10-21T00:04:42.255Z","logger":"controller","caller":"operator/operator.go:149","message":"discovered karpenter version","commit":"652e6aa","version":"1.0.5"}
kubectl get events -A --field-selector source=karpenter --sort-by='.lastTimestamp' -n 100
No resources found
tried DISABLE_WEBHOOK=true but didn't work as well
Versions:
- Chart Version: 1.0.5 and 1.0.6 both don't work
- Kubernetes Version (
kubectl version
): 1.31
- Please vote on this issue by adding a π reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Activity