Skip to content

Enhance distributed deployment with snapshot and resize support for multi-node testing #651

@mnencia

Description

@mnencia

Problem

The standard StatefulSet deployment (replicas=1) binds ALL volumes to a single node, making multi-node test scenarios impossible. This has been a recurring request:

The existing deploy/kubernetes-distributed/ DaemonSet deployment partially addresses this: it runs the driver on every node with csi-provisioner --node-deployment=true, enabling per-node volume provisioning with topology and capacity tracking. However, it only includes the provisioner sidecar, so snapshots and volume expansion are not supported in the distributed deployment.

Proposal

Enhance the distributed deployment to include the csi-snapshotter and csi-resizer sidecars, using their respective --node-deployment modes so each instance only handles operations for volumes local to its node.

Snapshotter: upstream support exists

The external-snapshotter already supports distributed snapshotting (kubernetes-csi/external-snapshotter#585, merged Dec 2021). It requires two coordinating flags on two separate components:

  1. The common snapshot-controller (a central Deployment) must be deployed with --enable-distributed-snapshotting=true. This makes it check PV node affinity and label each VolumeSnapshotContent with snapshot.storage.kubernetes.io/managed-by=<nodeName>.
  2. The csi-snapshotter sidecar (per-node, in the DaemonSet) must be deployed with --node-deployment=true. This makes it only reconcile VolumeSnapshotContent objects whose label matches its node.

Without the snapshot-controller flag, the labeling never activates and the per-node sidecars would see nothing to process.

The integration still needs to be built and tested against current sidecar versions (the standard deployment uses csi-snapshotter:v8.4.0). The distributed deploy.sh would need to deploy (or patch) the common snapshot-controller with the non-default flag, add a VolumeSnapshotClass, and handle RBAC bindings. Notably, the DaemonSet currently uses serviceAccountName: csi-provisioner, so upstream snapshotter RBAC (which binds to a csi-snapshotter service account) would need to be adapted to bind to csi-provisioner instead.

Resizer: upstream work needed

The external-resizer does not yet support --node-deployment. A PR existed (kubernetes-csi/external-resizer#195) but was closed by the triage bot after going rotten. The tracking issue kubernetes-csi/external-resizer#142 is still open. A fresh PR on external-resizer would be needed to enable distributed resize support.

Changes required in this repo

Deployment-level only, no Go code changes to the driver:

File Change
deploy/kubernetes-distributed/hostpath/csi-hostpath-plugin.yaml Add csi-snapshotter container (and csi-resizer once upstream supports it)
deploy/kubernetes-distributed/hostpath/csi-hostpath-snapshotclass.yaml (new) VolumeSnapshotClass
deploy/kubernetes-distributed/test-driver.yaml Add snapshotDataSource capability and SnapshotClass: FromName: true
deploy/kubernetes-distributed/deploy.sh Fetch and adapt snapshotter RBAC (bind to csi-provisioner SA), deploy common snapshot-controller with --enable-distributed-snapshotting=true, install snapshot CRDs if needed

Known limitations

These are inherent to a node-local hostpath driver and should be documented:

  1. Cross-node snapshot restore: creating a volume from a snapshot only works if the new volume lands on the same node as the snapshot. loadFromSnapshot operates on local filesystem paths.
  2. Cross-node volume cloning: same constraint, loadFromVolume runs cp -a on local paths.
  3. Immediate binding mode: the distributed provisioner uses --immediate-topology=false, so only WaitForFirstConsumer works.

CI

The existing CI already uses a 3-node Kind cluster (1 control-plane + 2 workers). The distributed deployment is never exercised in CI today. Setting CSI_PROW_DEPLOYMENT=kubernetes-distributed would select it, though the prow framework's snapshot-controller deployment would also need to be configured with --enable-distributed-snapshotting=true (either via the distributed deploy.sh or by patching the prow-managed manifests).

Next steps

I'd like to start by adding the snapshotter sidecar to the distributed DaemonSet, since the upstream support is already there. Meanwhile, I'm planning to open a fresh PR on kubernetes-csi/external-resizer for --node-deployment (the previous attempt in #195 went rotten, tracking issue kubernetes-csi/external-resizer#142 is still open). The resizer can be added to the distributed deployment once that lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions