Closed
Description
/kind bug
What happened?
As discussed at #1665, @torredil said it's fixed in v1.27 (#1665 (comment)) but we still got problem with v1.28
- The pod using volume pv-A running in node N1
- Karpenter terminate pod and terminate node N1
- K8s start new pod and trying attach volume pv-A but still need to wait 6 minutes to be release and attach to new Pod
What you expected to happen?
- After old pod has been terminated, the pv-A should be released and able to attach to new pod
How to reproduce it (as minimally and precisely as possible)?
- Im using https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html to setup cluster with 3 nodes
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: dev
spec:
version: 8.12.2
volumeClaimDeletePolicy: DeleteOnScaledownAndClusterDeletion
updateStrategy:
changeBudget:
maxSurge: 2
maxUnavailable: 1
nodeSets:
- name: default
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
podTemplate:
spec:
nodeSelector:
kubernetes.io/arch: arm64
topology.kubernetes.io/zone: eu-central-1a
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms4g -Xmx4g
resources:
requests:
memory: 5Gi
cpu: 1
limits:
memory: 5Gi
cpu: 2
config:
node.store.allow_mmap: false
- Trigger spot instance termination or just delete 1 ec2 instance
- The node has been removed in k8s very quick, old pod has been Terminated and k8s start new pod
- Pod stuck in 6 minutes with error
Multi-Attach error for volume "pvc-xxxxx-xxxxx-xxx" Volume is already exclusively attached to one node and can't be attached to another
- After 6 minutes new pod can attach volume
- Here is logs of
ebs-csi-controller
I0302 06:12:10.305080 1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1" devicePath="/dev/xvdaa"
<< at 06:14 the node has been terminated but no logs here >>
I0302 06:20:18.486042 1 controller.go:471] "ControllerUnpublishVolume: detaching" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:18.584737 1 cloud.go:792] "DetachDisk: called on non-attached volume" volumeID="vol-02b33186429105461"
I0302 06:20:18.807752 1 controller.go:474] "ControllerUnpublishVolume: attachment not found" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:19.124534 1 controller.go:421] "ControllerPublishVolume: attaching" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb"
I0302 06:20:20.635493 1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb" devicePath="/dev/xvdaa"
Anything else we need to know?:
I setup csi driver using eks add-on
Environment
- Kubernetes version (use
kubectl version
):
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0-eks-c417bb3
- Driver version:
v1.28.0-eksbuild.1