Skip to content

Add distributed snapshotting support to kubernetes-distributed deployment#653

Open
mnencia wants to merge 1 commit intokubernetes-csi:masterfrom
mnencia:dev/651
Open

Add distributed snapshotting support to kubernetes-distributed deployment#653
mnencia wants to merge 1 commit intokubernetes-csi:masterfrom
mnencia:dev/651

Conversation

@mnencia
Copy link
Copy Markdown

@mnencia mnencia commented Apr 12, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

The distributed DaemonSet deployment (deploy/kubernetes-distributed/) only includes the csi-provisioner sidecar. This adds the csi-snapshotter with --node-deployment=true, so snapshots work on each node's local volumes.

To support multiple sidecar RBAC roles, this introduces a unified ServiceAccount (csi-hostpathplugin-sa) with explicit ClusterRoleBindings for both the provisioner and snapshotter roles, following the same pattern used by the kubernetes-latest deployment.

The deploy script now handles the full snapshot infrastructure: CRD installation, snapshot-controller deployment with --enable-distributed-snapshotting=true, and node-reader RBAC (required for distributed snapshotting but commented out in the upstream snapshot-controller RBAC). If the snapshot-controller was already deployed (e.g., by prow.sh) without the flag, the script patches it. The destroy script cleans up all of these resources.

Snapshot E2E tests are enabled via snapshotDataSource and SnapshotClass in test-driver.yaml.

Which issue(s) this PR fixes:

Part of #651

Special notes for your reviewer:

Builds on prior work in #392 by @denisok, which went stale before merging.

The distributed snapshotting feature in external-snapshotter (kubernetes-csi/external-snapshotter#585) requires coordination between two components:

  1. The common snapshot-controller must run with --enable-distributed-snapshotting=true to label VolumeSnapshotContent objects with node affinity
  2. The per-node csi-snapshotter sidecar must run with --node-deployment=true to filter by those labels

The upstream snapshot-controller RBAC has Node read permissions commented out. The deploy script applies them via a separate snapshot-controller-node-reader ClusterRole.

Tested on a 3-node Kind cluster: provisioning on different workers, snapshot creation and deletion all work correctly.

Does this PR introduce a user-facing change?:

The kubernetes-distributed deployment now supports volume snapshots on per-node volumes.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 12, 2026
@k8s-ci-robot k8s-ci-robot requested a review from jingxu97 April 12, 2026 10:42
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mnencia
Once this PR has been reviewed and has the lgtm label, please assign gnufied for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @mnencia!

It looks like this is your first PR to kubernetes-csi/csi-driver-host-path 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-driver-host-path has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @mnencia. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 12, 2026
@mnencia mnencia marked this pull request as ready for review April 12, 2026 10:43
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2026
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with the
distributed DaemonSet variant, which runs the CSI driver on every node
with per-node provisioning, snapshotting, and resizing via
--node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain, etc.)
where PVCs can be created on different nodes instead of all landing on
the same one.

The resizer sidecar uses a custom image
(ghcr.io/mnencia/csi-resizer:node-deployment) built from
kubernetes-csi/external-resizer#573, pending upstream merge.
The distributed deployment manifests are sourced from
mnencia/csi-driver-host-path#dev/651-with-resizer, pending
kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with the distributed DaemonSet variant, which runs the CSI driver on every node with per-node provisioning, snapshotting, and resizing via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain, etc.) where PVCs can be created on different nodes instead of all landing on the same one.

The resizer sidecar uses a custom image (ghcr.io/mnencia/csi-resizer:node-deployment) built from kubernetes-csi/external-resizer#573, pending upstream merge. The distributed deployment manifests are sourced from mnencia/csi-driver-host-path#dev/651-with-resizer, pending kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with the distributed DaemonSet variant, which runs the CSI driver on every node with per-node provisioning, snapshotting, and resizing via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain, etc.) where PVCs can be created on different nodes instead of all landing on the same one.

The resizer sidecar uses a custom image (ghcr.io/mnencia/csi-resizer:node-deployment) built from kubernetes-csi/external-resizer#573, pending upstream merge. The distributed deployment manifests are sourced from mnencia/csi-driver-host-path#dev/651-with-resizer, pending kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with the distributed DaemonSet variant, which runs the CSI driver on every node with per-node provisioning, snapshotting, and resizing via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain, etc.) where PVCs can be created on different nodes instead of all landing on the same one.

The resizer sidecar uses a custom image (ghcr.io/mnencia/csi-resizer:node-deployment) built from kubernetes-csi/external-resizer#573, pending upstream merge. The distributed deployment manifests are sourced from mnencia/csi-driver-host-path#dev/651-with-resizer, pending kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with
the distributed DaemonSet variant, which runs the CSI driver on
every node with per-node provisioning, snapshotting, and resizing
via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain,
etc.) where PVCs can be created on different nodes instead of all
landing on the same one.

The resizer sidecar uses a custom image
(ghcr.io/mnencia/csi-resizer:node-deployment) built from
kubernetes-csi/external-resizer#573, pending upstream merge.
The distributed deployment manifests are sourced from
mnencia/csi-driver-host-path#dev/651-with-resizer, pending
kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
…ment

Add the csi-snapshotter sidecar with --node-deployment=true to the
distributed DaemonSet deployment, enabling per-node snapshot operations
for node-local volumes.

To support multiple sidecar RBAC roles, introduce a unified
ServiceAccount (csi-hostpathplugin-sa) with explicit ClusterRoleBindings
and RoleBindings for the provisioner and snapshotter roles, following
the same pattern used by the kubernetes-latest deployment.

The deploy script now installs the snapshot CRDs and snapshot-controller
(with --enable-distributed-snapshotting=true) if not already present,
and patches an existing snapshot-controller if it lacks the flag.
The upstream snapshot-controller RBAC has node read permissions
commented out, so the script uncomments them before applying, as
documented in the external-snapshotter README.

The destroy script cleans up the snapshot-controller and its RBAC
resources, as well as the VolumeSnapshotClass.

Snapshot E2E tests are enabled via snapshotDataSource in test-driver.yaml.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with
the distributed DaemonSet variant, which runs the CSI driver on
every node with per-node provisioning, snapshotting, and resizing
via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain,
etc.) where PVCs can be created on different nodes instead of all
landing on the same one.

The resizer sidecar uses a custom image
(ghcr.io/mnencia/csi-resizer:node-deployment) built from
kubernetes-csi/external-resizer#573, pending upstream merge.
The distributed deployment manifests are sourced from
mnencia/csi-driver-host-path#dev/651-with-resizer, pending
kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with
the distributed DaemonSet variant, which runs the CSI driver on
every node with per-node provisioning, snapshotting, and resizing
via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain,
etc.) where PVCs can be created on different nodes instead of all
landing on the same one.

The resizer sidecar uses a custom image
(ghcr.io/mnencia/csi-resizer:node-deployment) built from
kubernetes-csi/external-resizer#573, pending upstream merge.
The distributed deployment manifests are sourced from
mnencia/csi-driver-host-path#dev/651-with-resizer, pending
kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with
the distributed DaemonSet variant, which runs the CSI driver on
every node with per-node provisioning, snapshotting, and resizing
via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain,
etc.) where PVCs can be created on different nodes instead of all
landing on the same one.

The resizer sidecar uses a custom image
(ghcr.io/mnencia/csi-resizer:node-deployment) built from
kubernetes-csi/external-resizer#573, pending upstream merge.
The distributed deployment manifests are sourced from
mnencia/csi-driver-host-path#dev/651-with-resizer, pending
kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit to cloudnative-pg/cloudnative-pg that referenced this pull request Apr 13, 2026
Replace the single-node StatefulSet csi-hostpath deployment with
the distributed DaemonSet variant, which runs the CSI driver on
every node with per-node provisioning, snapshotting, and resizing
via --node-deployment sidecars.

This enables multi-node test scenarios (pod failover, node drain,
etc.) where PVCs can be created on different nodes instead of all
landing on the same one.

The resizer sidecar uses a custom image
(ghcr.io/mnencia/csi-resizer:node-deployment) built from
kubernetes-csi/external-resizer#573, pending upstream merge.
The distributed deployment manifests are sourced from
mnencia/csi-driver-host-path#dev/651-with-resizer, pending
kubernetes-csi/csi-driver-host-path#653.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants