Problem
The standard StatefulSet deployment (replicas=1) binds ALL volumes to a single node, making multi-node test scenarios impossible. This has been a recurring request:
The existing deploy/kubernetes-distributed/ DaemonSet deployment partially addresses this: it runs the driver on every node with csi-provisioner --node-deployment=true, enabling per-node volume provisioning with topology and capacity tracking. However, it only includes the provisioner sidecar, so snapshots and volume expansion are not supported in the distributed deployment.
Proposal
Enhance the distributed deployment to include the csi-snapshotter and csi-resizer sidecars, using their respective --node-deployment modes so each instance only handles operations for volumes local to its node.
Snapshotter: upstream support exists
The external-snapshotter already supports distributed snapshotting (kubernetes-csi/external-snapshotter#585, merged Dec 2021). It requires two coordinating flags on two separate components:
- The common snapshot-controller (a central Deployment) must be deployed with
--enable-distributed-snapshotting=true. This makes it check PV node affinity and label each VolumeSnapshotContent with snapshot.storage.kubernetes.io/managed-by=<nodeName>.
- The csi-snapshotter sidecar (per-node, in the DaemonSet) must be deployed with
--node-deployment=true. This makes it only reconcile VolumeSnapshotContent objects whose label matches its node.
Without the snapshot-controller flag, the labeling never activates and the per-node sidecars would see nothing to process.
The integration still needs to be built and tested against current sidecar versions (the standard deployment uses csi-snapshotter:v8.4.0). The distributed deploy.sh would need to deploy (or patch) the common snapshot-controller with the non-default flag, add a VolumeSnapshotClass, and handle RBAC bindings. Notably, the DaemonSet currently uses serviceAccountName: csi-provisioner, so upstream snapshotter RBAC (which binds to a csi-snapshotter service account) would need to be adapted to bind to csi-provisioner instead.
Resizer: upstream work needed
The external-resizer does not yet support --node-deployment. A PR existed (kubernetes-csi/external-resizer#195) but was closed by the triage bot after going rotten. The tracking issue kubernetes-csi/external-resizer#142 is still open. A fresh PR on external-resizer would be needed to enable distributed resize support.
Changes required in this repo
Deployment-level only, no Go code changes to the driver:
| File |
Change |
deploy/kubernetes-distributed/hostpath/csi-hostpath-plugin.yaml |
Add csi-snapshotter container (and csi-resizer once upstream supports it) |
deploy/kubernetes-distributed/hostpath/csi-hostpath-snapshotclass.yaml (new) |
VolumeSnapshotClass |
deploy/kubernetes-distributed/test-driver.yaml |
Add snapshotDataSource capability and SnapshotClass: FromName: true |
deploy/kubernetes-distributed/deploy.sh |
Fetch and adapt snapshotter RBAC (bind to csi-provisioner SA), deploy common snapshot-controller with --enable-distributed-snapshotting=true, install snapshot CRDs if needed |
Known limitations
These are inherent to a node-local hostpath driver and should be documented:
- Cross-node snapshot restore: creating a volume from a snapshot only works if the new volume lands on the same node as the snapshot.
loadFromSnapshot operates on local filesystem paths.
- Cross-node volume cloning: same constraint,
loadFromVolume runs cp -a on local paths.
Immediate binding mode: the distributed provisioner uses --immediate-topology=false, so only WaitForFirstConsumer works.
CI
The existing CI already uses a 3-node Kind cluster (1 control-plane + 2 workers). The distributed deployment is never exercised in CI today. Setting CSI_PROW_DEPLOYMENT=kubernetes-distributed would select it, though the prow framework's snapshot-controller deployment would also need to be configured with --enable-distributed-snapshotting=true (either via the distributed deploy.sh or by patching the prow-managed manifests).
Next steps
I'd like to start by adding the snapshotter sidecar to the distributed DaemonSet, since the upstream support is already there. Meanwhile, I'm planning to open a fresh PR on kubernetes-csi/external-resizer for --node-deployment (the previous attempt in #195 went rotten, tracking issue kubernetes-csi/external-resizer#142 is still open). The resizer can be added to the distributed deployment once that lands.
Problem
The standard StatefulSet deployment (replicas=1) binds ALL volumes to a single node, making multi-node test scenarios impossible. This has been a recurring request:
The existing
deploy/kubernetes-distributed/DaemonSet deployment partially addresses this: it runs the driver on every node withcsi-provisioner --node-deployment=true, enabling per-node volume provisioning with topology and capacity tracking. However, it only includes the provisioner sidecar, so snapshots and volume expansion are not supported in the distributed deployment.Proposal
Enhance the distributed deployment to include the csi-snapshotter and csi-resizer sidecars, using their respective
--node-deploymentmodes so each instance only handles operations for volumes local to its node.Snapshotter: upstream support exists
The external-snapshotter already supports distributed snapshotting (kubernetes-csi/external-snapshotter#585, merged Dec 2021). It requires two coordinating flags on two separate components:
--enable-distributed-snapshotting=true. This makes it check PV node affinity and label each VolumeSnapshotContent withsnapshot.storage.kubernetes.io/managed-by=<nodeName>.--node-deployment=true. This makes it only reconcile VolumeSnapshotContent objects whose label matches its node.Without the snapshot-controller flag, the labeling never activates and the per-node sidecars would see nothing to process.
The integration still needs to be built and tested against current sidecar versions (the standard deployment uses
csi-snapshotter:v8.4.0). The distributeddeploy.shwould need to deploy (or patch) the common snapshot-controller with the non-default flag, add a VolumeSnapshotClass, and handle RBAC bindings. Notably, the DaemonSet currently usesserviceAccountName: csi-provisioner, so upstream snapshotter RBAC (which binds to acsi-snapshotterservice account) would need to be adapted to bind tocsi-provisionerinstead.Resizer: upstream work needed
The external-resizer does not yet support
--node-deployment. A PR existed (kubernetes-csi/external-resizer#195) but was closed by the triage bot after going rotten. The tracking issue kubernetes-csi/external-resizer#142 is still open. A fresh PR on external-resizer would be needed to enable distributed resize support.Changes required in this repo
Deployment-level only, no Go code changes to the driver:
deploy/kubernetes-distributed/hostpath/csi-hostpath-plugin.yamldeploy/kubernetes-distributed/hostpath/csi-hostpath-snapshotclass.yaml(new)deploy/kubernetes-distributed/test-driver.yamlsnapshotDataSourcecapability andSnapshotClass: FromName: truedeploy/kubernetes-distributed/deploy.shcsi-provisionerSA), deploy common snapshot-controller with--enable-distributed-snapshotting=true, install snapshot CRDs if neededKnown limitations
These are inherent to a node-local hostpath driver and should be documented:
loadFromSnapshotoperates on local filesystem paths.loadFromVolumerunscp -aon local paths.Immediatebinding mode: the distributed provisioner uses--immediate-topology=false, so onlyWaitForFirstConsumerworks.CI
The existing CI already uses a 3-node Kind cluster (1 control-plane + 2 workers). The distributed deployment is never exercised in CI today. Setting
CSI_PROW_DEPLOYMENT=kubernetes-distributedwould select it, though the prow framework's snapshot-controller deployment would also need to be configured with--enable-distributed-snapshotting=true(either via the distributeddeploy.shor by patching the prow-managed manifests).Next steps
I'd like to start by adding the snapshotter sidecar to the distributed DaemonSet, since the upstream support is already there. Meanwhile, I'm planning to open a fresh PR on kubernetes-csi/external-resizer for
--node-deployment(the previous attempt in #195 went rotten, tracking issue kubernetes-csi/external-resizer#142 is still open). The resizer can be added to the distributed deployment once that lands.