Skip to content

PVC restore from vcluster snapshot fails #3951

@ad-marad

Description

@ad-marad

What happened?

We are usung vCluster OSS and want to be able to backup and restore vclusters properly. We are also using Rancher as k8s management tool.

Tests with velero in the pasts fails because of the rancher integration.

The snapshot runs fine and we get a file in our s3-Target. During the snapshot we see the volumesnapshots in the host cluster.

A snapshot restore without deleting deployments or pvcs inside the vcluster comes up but without restoring the old data, its using the existing pvc.

When we delete the deployment and pvcs in the vcluster and try to restore, the vcluster comes up but the pvcs stuck in "pending" state.

Error in an impacted PVC describe:
Warning SyncError 7m51s persistent-volume-claim-syncer Error syncing to host cluster: update object status: persistentvolumeclaims "mysql-pv-claim-x-test-x-kw-test123" is forbidden: User "system:serviceaccount:kw-test123:vc-kw-test123" cannot update resource "persistentvolumeclaims/status" in API group "" in the namespace "kw-test123"

What did you expect to happen?

PVCs will be restored sucessfully.

How can we reproduce it (as minimally and precisely as possible)?

  • create a vcluster
  • create a stateful deployment (with ceph-rbd CSI driver StorageClass synced from Hostcluster)
  • Run vcluster snapshot create --include-volumes
  • Verify the snapshot archive is written to S3
  • Delete the workload and its PVCs
  • Run vcluster snapshot restore
  • Observe that PVCs are created but remain Pending with no dataSource

Anything else we need to know?

  • Host-Cluster OS is Talos OS
  • Rancher Environment
  • Ceph for CSI and S3-Endpoint

Host cluster Kubernetes version

Details
$ kubectl version
Server Version: v1.33.4

vcluster version

Details
$ vcluster --version
vcluster version 0.34.0

VCluster Config

Details
sync:
  toHost:
    ingresses:
      enabled: true
    secrets:
      enabled: true
      all: true


  fromHost:
    nodes:
      enabled: true
    storageClasses:
      enabled: true

controlPlane:
  backingStore:
    etcd:
      deploy:
        enabled: true
        statefulSet:
          highAvailability:
            replicas: 3
          resources:
            requests:
              cpu: 20m
              memory: 150Mi


  service:
    annotations:
      "loft.sh/uninstall-on-cluster-delete": "true"
  statefulSet:
    highAvailability:
      replicas: 3
    security:
      podSecurityContext:
        fsGroup: 65532
      containerSecurityContext:
        runAsUser: 65532
        runAsNonRoot: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions