Skip to content

Commit 88957e7

Browse files
committed
WIP: pkg/agent: wait for all volumes to be detached before rebooting
This commit provides PoC version of implementing agent waiting for all volumtes attached to the node to be detached as a step after draining the node, as shutting down the Pod does not mean the volume has been detached, as usually CSI agent will be running as a DaemonSet on the node and will take care of detaching the volume from the node when the pod shuts down. This commit improves rebooting experience, as right now if there is not enough time for CSI agent to detach the volumes from the node, node gets rebooted and pods using attached volumes have no way to be attached to other nodes, which effectively increases the downtime caused for stateful workloads. This commit still requires tests and better interface for the users. If someone wants to try this feature on their own cluster, I've published the following image I've been testing with: quay.io/invidian/flatcar-linux-update-operator:97c0dee50c807dbba7d2debc59b369f84002797e Closes #30 Signed-off-by: Mateusz Gozdek <[email protected]>
1 parent f45ff7c commit 88957e7

File tree

2 files changed

+30
-0
lines changed

2 files changed

+30
-0
lines changed

examples/deploy/rbac/cluster-role.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,3 +47,9 @@ rules:
4747
- daemonsets
4848
verbs:
4949
- get
50+
- apiGroups:
51+
- storage.k8s.io
52+
resources:
53+
- volumeattachments
54+
verbs:
55+
- list

pkg/agent/agent.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,30 @@ func (k *klocksmith) process(ctx context.Context) error {
290290

291291
klog.Info("Node drained, rebooting")
292292

293+
for {
294+
attachments, err := k.clientset.StorageV1().VolumeAttachments().List(ctx, metav1.ListOptions{})
295+
if err != nil {
296+
klog.Errorf("Listing volume attachments: %v", err)
297+
continue
298+
}
299+
300+
anyVolumeAttached := false
301+
302+
for _, attachment := range attachments.Items {
303+
if attachment.Status.Attached && attachment.Spec.NodeName == k.nodeName {
304+
anyVolumeAttached = true
305+
klog.Infof("Volume %q is still attached, waiting for detach", attachment.Name)
306+
}
307+
}
308+
309+
if !anyVolumeAttached {
310+
klog.Info("All volumes are detached from node, rebooting.")
311+
break
312+
}
313+
314+
time.Sleep(5 * time.Second)
315+
}
316+
293317
// Reboot.
294318
k.lc.Reboot(false)
295319

0 commit comments

Comments
 (0)