-
Notifications
You must be signed in to change notification settings - Fork 601
Issue with Migrating CephFS Volumes Between Clusters #5339
Description
We are in the process of migrating our Kubernetes workloads from an existing Ceph cluster ("origin") to a new Ceph cluster ("remote"). Our goal is to transition the CephFS-based Persistent Volumes (PVs) with minimal changes, primarily by updating the CSI configuration in Kubernetes.
Migration Approach:
-
Utilize a Ceph client that has access to both the origin and remote clusters.
-
Create a CephFS volume in the origin cluster using the CSI driver connected to the origin monitors.
-
Terminate the associated pod to unmount the volume.
-
Employ rsync with the -avpPX options to copy the volume data from the origin to the remote cluster.
-
Update the CSI configuration in Kubernetes to point to the remote cluster's monitors.
-
Deploy the pod, expecting it to mount the volume from the remote cluster
After performing the above steps, when the pod is deployed with the updated CSI configuration pointing to the remote cluster, it fails to start. The error observed is related to extended attributes (xattr), specifically a getxattr error.
This suggests that the metadata associated with the files, such as inodes and other extended attributes, are not being preserved during the rsync operation.
Actions Taken:
-
CephFS Snapshot Mirroring: We considered using CephFS snapshot mirroring to replicate the data between clusters. However, this approach requires the fsid of the origin and remote clusters to be different. In our case, due to constraints with RBD mirroring, we have set the same fsid for both clusters, making this approach unfeasible.
-
RADOS Object Export/Import: We attempted to export and import the RADOS objects at the pool level for both data and metadata pools. While the objects appear to be present in the remote cluster, the CephFS filesystem on the remote cluster does not recognize them, indicating a mismatch or missing metadata.
Request for Guidance:
Given the constraints:
-
The need to maintain the same fsid across clusters due to RBD mirroring requirements.
-
The desire to minimize changes in Kubernetes, ideally only updating the CSI configuration.
-
The necessity to preserve file metadata and extended attributes during migration.
I also get this error:
rpc error: code = Internal desc = rpc error: code = Internal desc = rados: ret=-61, No data available: "error in getxattr"
unable to read omap keys: pool or key missiong
What would be the recommended approach to migrate CephFS volumes between clusters under these conditions? Are there tools or methods that can facilitate this migration while ensuring data integrity and minimal disruption?
Any insights or suggestions would be greatly appreciated.