Skip to content

Add Support for Restoring OCS Operator CRs in ODF CLI #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

OdedViner
Copy link
Contributor

@OdedViner OdedViner commented Feb 4, 2025

This PR adds support for restoring deleted Custom Resources (CRs) for the OCS operator via the ODF CLI. It includes logic for handling storageclusters CRs by dynamically setting groupName and versionResource, ensuring compatibility and reliability during the restore process.

Test Procedure:

SetuP:

OCP Version:4.19.0-0.nightly-2025-04-24-005837
ODF Version: odf-operator.v4.19.0-49.stable
Platform: IBMcloud

1.Check storagecluster status:

$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   15h   Ready              2025-04-28T17:43:24Z   4.19.0

2.Check pods status:

$ oc get pods
NAME                                                              READY   STATUS      RESTARTS      AGE
ceph-csi-controller-manager-5f8d767cf5-l5bf8                      1/1     Running     2 (46m ago)   48m
cnpg-controller-manager-79b7968c9d-trcxh                          1/1     Running     0             44m
csi-addons-controller-manager-f958445d-gwqjv                      1/1     Running     0             48m
noobaa-core-0                                                     2/2     Running     0             41m
noobaa-db-pg-cluster-1                                            1/1     Running     0             42m
noobaa-db-pg-cluster-2                                            1/1     Running     0             41m
noobaa-endpoint-559fbc6c6f-zbzsm                                  1/1     Running     0             40m
noobaa-operator-54ff97cbbf-qcg98                                  1/1     Running     0             44m
ocs-client-operator-console-7db4497c77-v5s6f                      1/1     Running     0             48m
ocs-client-operator-controller-manager-59848c8988-wwprd           1/1     Running     0             48m
ocs-metrics-exporter-c4b49dcdf-c7tdj                              3/3     Running     2 (46m ago)   47m
ocs-operator-5d7b4b4547-5s8hj                                     1/1     Running     2 (46m ago)   48m
ocs-provider-server-65d9fd4647-zczjj                              1/1     Running     0             48m
odf-console-57b9488fb6-dfzvp                                      1/1     Running     0             52m
odf-operator-controller-manager-6f545b7469-ht9dp                  1/1     Running     1 (47m ago)   52m
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-7465d64bdkxhlf   7/7     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-7465d64bdmqxgq   7/7     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-27242            3/3     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-782g8            3/3     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-8nbrg            3/3     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-96fn8            3/3     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-f8bf5            3/3     Running     0             47m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-r2nsz            3/3     Running     0             47m
openshift-storage.nfs.csi.ceph.com-ctrlplugin-cccd975f-9zmlw      6/6     Running     0             47m
openshift-storage.nfs.csi.ceph.com-ctrlplugin-cccd975f-lbbrm      6/6     Running     0             47m
openshift-storage.nfs.csi.ceph.com-nodeplugin-7kbj4               3/3     Running     0             47m
openshift-storage.nfs.csi.ceph.com-nodeplugin-7nnhk               3/3     Running     0             47m
openshift-storage.nfs.csi.ceph.com-nodeplugin-jjggr               3/3     Running     0             47m
openshift-storage.nfs.csi.ceph.com-nodeplugin-mzjpt               3/3     Running     0             47m
openshift-storage.nfs.csi.ceph.com-nodeplugin-pp7rp               3/3     Running     0             47m
openshift-storage.nfs.csi.ceph.com-nodeplugin-rs5s8               3/3     Running     0             47m
openshift-storage.rbd.csi.ceph.com-ctrlplugin-59bc4cb654-hd994    8/8     Running     0             47m
openshift-storage.rbd.csi.ceph.com-ctrlplugin-59bc4cb654-n2t5m    8/8     Running     1 (46m ago)   47m
openshift-storage.rbd.csi.ceph.com-nodeplugin-fjq5l               4/4     Running     0             47m
openshift-storage.rbd.csi.ceph.com-nodeplugin-hx7m2               4/4     Running     0             47m
openshift-storage.rbd.csi.ceph.com-nodeplugin-kw886               4/4     Running     0             47m
openshift-storage.rbd.csi.ceph.com-nodeplugin-lfjcl               4/4     Running     0             47m
openshift-storage.rbd.csi.ceph.com-nodeplugin-rxp47               4/4     Running     0             47m
openshift-storage.rbd.csi.ceph.com-nodeplugin-z4gtj               4/4     Running     0             47m
rook-ceph-crashcollector-oviner4-ibm-gqgn5-worker-1-d69d6-f724f   1/1     Running     0             45m
rook-ceph-crashcollector-oviner4-ibm-gqgn5-worker-2-fcn4g-rfp9z   1/1     Running     0             45m
rook-ceph-crashcollector-oviner4-ibm-gqgn5-worker-3-sg8mc-w6psg   1/1     Running     0             44m
rook-ceph-exporter-oviner4-ibm-gqgn5-worker-1-d69d6-66bd99kmkw4   1/1     Running     0             45m
rook-ceph-exporter-oviner4-ibm-gqgn5-worker-2-fcn4g-754bffbcjd9   1/1     Running     0             45m
rook-ceph-exporter-oviner4-ibm-gqgn5-worker-3-sg8mc-64f94bhvqgp   1/1     Running     0             44m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-78686bb7sl4mw   2/2     Running     0             44m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-797875bff952q   2/2     Running     0             44m
rook-ceph-mgr-a-65785b66cd-xlv2s                                  3/3     Running     0             45m
rook-ceph-mgr-b-d98ccb4bc-2j8m2                                   3/3     Running     0             45m
rook-ceph-mon-a-7fc5b7775d-96ckx                                  2/2     Running     0             46m
rook-ceph-mon-b-6dd694669b-zh2k5                                  2/2     Running     0             45m
rook-ceph-mon-c-8575f9999b-stz89                                  2/2     Running     0             45m
rook-ceph-operator-796f877956-t7nck                               1/1     Running     0             44m
rook-ceph-osd-0-764bc5c6f-mdjvk                                   2/2     Running     0             43m
rook-ceph-osd-1-76b984db9b-t4m97                                  2/2     Running     0             42m
rook-ceph-osd-2-5684958cc5-mzcqc                                  2/2     Running     0             42m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0l26wm-lvqcm           0/1     Completed   0             44m
rook-ceph-osd-prepare-ocs-deviceset-1-data-02fnch-gb6dn           0/1     Completed   0             44m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0sjrxv-gjpdb           0/1     Completed   0             44m
rook-ceph-tools-6549cd6784-4v9vt                                  1/1     Running     0             44m
storageclient-737342087af10580-status-reporter-29106052-2fckf     0/1     Completed   0             21s
ux-backend-server-64dd9b84d7-9kgwg                                2/2     Running     0             48m

  1. Attempted to restore StorageCluster CR (storageclusters.ocs.openshift.io) → skipping because the CR is not marked as deleted in the status
if cr.GetDeletionTimestamp() != nil && (crName == "" || crName == cr.GetName()) {

https://github.com/rook/kubectl-rook-ceph/blob/dbfe77cc57a16be39fc043f04baff26c5c94bba8/pkg/restore/crd.go#L63

$ ./bin/odf restore deleted storageclusters.ocs.openshift.io
Info: Detecting which resources to restore for crd "storageclusters"
Info: Nothing to do here, no "storageclusters" resources in deleted state

4.Delete storagecluster CR:

$ oc delete storagecluster ocs-storagecluster
storagecluster.ocs.openshift.io "ocs-storagecluster" deleted

5.Restore Storagecluster CR:

oviner~/DEV_REPOS/odf-cli(restore_ocs_cr)$ ./bin/odf restore deleted storageclusters.ocs.openshift.ioio
Info: Detecting which resources to restore for crd "storageclusters"
Info: Restoring CR ocs-storagecluster
Warning: The resource ocs-storagecluster was found deleted. Do you want to restore it? yes | no

yes
Info: Proceeding with restoring deleting CR
Info: Scaling down the operator
Info: Deleting validating webhook rook-ceph-webhook if present
Info: Removing ownerreferences from resources with matching uid 9b8e7561-dea7-405f-8492-8a43353e48a0
Info: Removing ownerReference for cephblockpools/builtin-mgr
Info: Removing ownerReference for cephblockpools/ocs-storagecluster-cephblockpool
Info: Removing ownerReference for cephclusters/ocs-storagecluster-cephcluster
Info: Removing ownerReference for cephfilesystems/ocs-storagecluster-cephfilesystem
Info: Removing ownerReference for storageconsumers/internal
Info: Removing owner references for secret onboarding-private-key
Info: Removed ownerReference for Secret: onboarding-private-key

Info: Removing owner references for secret onboarding-ticket-key
Info: Removed ownerReference for Secret: onboarding-ticket-key

Info: Removing owner references for configmaps rook-config-override
Info: Removed ownerReference for configmap: rook-config-override

Info: Removing owner references for service ocs-metrics-exporter
Info: Removed ownerReference for service: ocs-metrics-exporter

Info: Removing owner references for service ocs-provider-server
Info: Removed ownerReference for service: ocs-provider-server

Info: Removing owner references for deployment ocs-metrics-exporter
Info: Removed ownerReference for deployment: ocs-metrics-exporter

Info: Removing owner references for deployment ocs-provider-server
Info: Removed ownerReference for deployment: ocs-provider-server

Info: Removing owner references for deployment rook-ceph-tools
Info: Removed ownerReference for deployment: rook-ceph-tools

Info: Removing finalizers from storageclusters/ocs-storagecluster
Info: Re-creating the CR storageclusters from dynamic resource
W0504 15:54:52.451197  147706 warnings.go:70] metadata.finalizers: "storagecluster.ocs.openshift.io": prefer a domain-qualified finalizer name including a path (/) to avoid accidental conflicts with other finalizer writers
Info: Scaling up the operator
Info: CR is successfully restored. Please watch the operator logs and check the crd

6.Check storagecluster CR status:

$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   39s   Ready              2025-04-29T08:58:16Z   4.19.0

7.Check pods status:

$ oc get pods 
NAME                                                              READY   STATUS      RESTARTS      AGE
ceph-csi-controller-manager-5f8d767cf5-l5bf8                      1/1     Running     2 (53m ago)   55m
cnpg-controller-manager-79b7968c9d-trcxh                          1/1     Running     0             51m
csi-addons-controller-manager-f958445d-gwqjv                      1/1     Running     0             55m
noobaa-operator-54ff97cbbf-qcg98                                  1/1     Running     0             51m
ocs-client-operator-console-7db4497c77-v5s6f                      1/1     Running     0             55m
ocs-client-operator-controller-manager-59848c8988-wwprd           1/1     Running     0             55m
ocs-metrics-exporter-c4b49dcdf-c7tdj                              3/3     Running     2 (53m ago)   54m
ocs-operator-5d7b4b4547-n44wp                                     1/1     Running     0             4m36s
ocs-provider-server-65d9fd4647-zczjj                              1/1     Running     0             55m
odf-console-57b9488fb6-dfzvp                                      1/1     Running     0             60m
odf-operator-controller-manager-6f545b7469-ht9dp                  1/1     Running     1 (54m ago)   60m
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-7465d64bdkxhlf   7/7     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-7465d64bdmqxgq   7/7     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-27242            3/3     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-782g8            3/3     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-8nbrg            3/3     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-96fn8            3/3     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-f8bf5            3/3     Running     0             55m
openshift-storage.cephfs.csi.ceph.com-nodeplugin-r2nsz            3/3     Running     0             55m
openshift-storage.nfs.csi.ceph.com-ctrlplugin-cccd975f-9zmlw      6/6     Running     0             55m
openshift-storage.nfs.csi.ceph.com-ctrlplugin-cccd975f-lbbrm      6/6     Running     0             55m
openshift-storage.nfs.csi.ceph.com-nodeplugin-7kbj4               3/3     Running     0             55m
openshift-storage.nfs.csi.ceph.com-nodeplugin-7nnhk               3/3     Running     0             55m
openshift-storage.nfs.csi.ceph.com-nodeplugin-jjggr               3/3     Running     0             55m
openshift-storage.nfs.csi.ceph.com-nodeplugin-mzjpt               3/3     Running     0             55m
openshift-storage.nfs.csi.ceph.com-nodeplugin-pp7rp               3/3     Running     0             55m
openshift-storage.nfs.csi.ceph.com-nodeplugin-rs5s8               3/3     Running     0             55m
openshift-storage.rbd.csi.ceph.com-ctrlplugin-59bc4cb654-hd994    8/8     Running     0             55m
openshift-storage.rbd.csi.ceph.com-ctrlplugin-59bc4cb654-n2t5m    8/8     Running     1 (54m ago)   55m
openshift-storage.rbd.csi.ceph.com-nodeplugin-fjq5l               4/4     Running     0             55m
openshift-storage.rbd.csi.ceph.com-nodeplugin-hx7m2               4/4     Running     0             55m
openshift-storage.rbd.csi.ceph.com-nodeplugin-kw886               4/4     Running     0             55m
openshift-storage.rbd.csi.ceph.com-nodeplugin-lfjcl               4/4     Running     0             55m
openshift-storage.rbd.csi.ceph.com-nodeplugin-rxp47               4/4     Running     0             55m
openshift-storage.rbd.csi.ceph.com-nodeplugin-z4gtj               4/4     Running     0             55m
rook-ceph-crashcollector-oviner4-ibm-gqgn5-worker-1-d69d6-f724f   1/1     Running     0             52m
rook-ceph-crashcollector-oviner4-ibm-gqgn5-worker-2-fcn4g-rfp9z   1/1     Running     0             52m
rook-ceph-crashcollector-oviner4-ibm-gqgn5-worker-3-sg8mc-w6psg   1/1     Running     0             51m
rook-ceph-exporter-oviner4-ibm-gqgn5-worker-1-d69d6-66bd99kmkw4   1/1     Running     0             52m
rook-ceph-exporter-oviner4-ibm-gqgn5-worker-2-fcn4g-754bffbcjd9   1/1     Running     0             52m
rook-ceph-exporter-oviner4-ibm-gqgn5-worker-3-sg8mc-64f94bhvqgp   1/1     Running     0             51m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-78686bb7sl4mw   2/2     Running     0             51m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-797875bff952q   2/2     Running     0             51m
rook-ceph-mgr-a-65785b66cd-xlv2s                                  3/3     Running     0             52m
rook-ceph-mgr-b-d98ccb4bc-2j8m2                                   3/3     Running     0             52m
rook-ceph-mon-a-7fc5b7775d-96ckx                                  2/2     Running     0             53m
rook-ceph-mon-b-6dd694669b-zh2k5                                  2/2     Running     0             52m
rook-ceph-mon-c-8575f9999b-stz89                                  2/2     Running     0             52m
rook-ceph-operator-796f877956-t7nck                               1/1     Running     0             51m
rook-ceph-osd-0-764bc5c6f-mdjvk                                   2/2     Running     0             50m
rook-ceph-osd-1-76b984db9b-t4m97                                  2/2     Running     0             50m
rook-ceph-osd-2-5684958cc5-mzcqc                                  2/2     Running     0             49m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0l26wm-lvqcm           0/1     Completed   0             52m
rook-ceph-osd-prepare-ocs-deviceset-1-data-02fnch-gb6dn           0/1     Completed   0             52m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0sjrxv-gjpdb           0/1     Completed   0             52m
rook-ceph-tools-6549cd6784-4v9vt                                  1/1     Running     0             51m
storageclient-737342087af10580-status-reporter-29106059-sbwhd     0/1     Completed   0             28s
ux-backend-server-64dd9b84d7-9kgwg                                2/2     Running     0             55m


Copy link
Contributor

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we really should enable the restore of any of the ODF CRs, whether from Rook, OCS operator, CSI operator, Noobaa, etc.

One approach would be to just require args[0] to include the fully qualified CRD type such as storageclusters.ocs.openshift.io, then we just extract the group name from that full name. We could return an error if they passed some group name that we don't recognize as belonging to ODF.

pkgrestore.RestoreCrd(cmd.Context(), root.ClientSets, root.OperatorNamespace, root.StorageClusterNamespace, args)
var groupName string
var versionResource string
if args[0] == "storageclusters" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a check for length of args before this? If not, this could crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command already includes argument validation through Cobra with the line:

Args: cobra.RangeArgs(1, 2),

This ensures that if the number of arguments is outside the 1–2 range, Cobra will return an error and prevent further execution. As shown in the output, passing three arguments results in:

$ ./bin/odf restore deleted  a b c
Error: accepts between 1 and 2 arg(s), received 3

@OdedViner
Copy link
Contributor Author

With this change, we really should enable the restore of any of the ODF CRs, whether from Rook, OCS operator, CSI operator, Noobaa, etc.

One approach would be to just require args[0] to include the fully qualified CRD type such as storageclusters.ocs.openshift.io, then we just extract the group name from that full name. We could return an error if they passed some group name that we don't recognize as belonging to ODF.

Hi @travisn ,

I’ve implemented the change as suggested. The restore command now requires that the first argument be a fully qualified CRD type (for example, storageclusters.ocs.openshift.io). The code parses this input to extract the resource name, group, and API version, and it validates the group against a set of supported groups (Rook, OCS, CSI, NooBaa, etc.). If the provided group isn’t recognized as part of ODF, the command returns an error with a helpful message listing the supported groups.

This update should enable us to restore any of the ODF CRs regardless of the operator to which they belong. Please take a look at the changes and let me know if you have any further suggestions or feedback.

Thanks,
Oded

Comment on lines 14 to 22
var groupVersions = map[string]string{
"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"storage.k8s.io": "v1",
"odf.openshift.io": "v1alpha1",
"noobaa.io": "v1alpha1",
"csiaddons.openshift.io": "v1alpha1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var groupVersions = map[string]string{
"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"storage.k8s.io": "v1",
"odf.openshift.io": "v1alpha1",
"noobaa.io": "v1alpha1",
"csiaddons.openshift.io": "v1alpha1",
var groupVersions = map[string]string{
"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"storage.k8s.io": "v1",
"odf.openshift.io": "v1alpha1",
"noobaa.io": "v1alpha1",
"csiaddons.openshift.io": "v1alpha1",

could you also check csi-operator api groups name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added "csi.ceph.io": "v1". Is that what you meant for this CRD?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is v1alpha1 please confirm and upate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

openshift-ci bot commented Feb 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: OdedViner
Once this PR has been reviewed and has the lgtm label, please ask for approval from subhamkrai. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// Parse the fully qualified CRD (e.g. "cephclusters.ceph.rook.io").
resourceName, groupName, version, err := parseFullyQualifiedCRD(args[0])
if err != nil {
fmt.Printf("Error parsing CRD type: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of printing the error, see the helper for logging.Fatal()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"storage.k8s.io": "v1",
"csi.ceph.io": "v1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4.19 we are pushing for v1beta1 for the csi CRDs, let's assume that for now

Suggested change
"csi.ceph.io": "v1",
"csi.ceph.io": "v1beta1",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

var groupVersions = map[string]string{
"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"storage.k8s.io": "v1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for csidrivers and storageclasses? Seems like we shouldn't need to restore this type. ODF might create resources of that type, but I don't anticipate they would have finalizers that we would need to worry about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed "storage.k8s.io" CRD

@OdedViner OdedViner force-pushed the restore_ocs_cr branch 3 times, most recently from 684d651 to 1a49d6d Compare February 18, 2025 12:23
@OdedViner
Copy link
Contributor Author

This PR depends on the kubectl-rook-ceph PR rook/kubectl-rook-ceph#354 that was merged today.
@subhamkrai @travisn , is there an option to run a bot to update the dependencies?

@subhamkrai
Copy link
Contributor

This PR depends on the kubectl-rook-ceph PR rook/kubectl-rook-ceph#354 that was merged today. @subhamkrai @travisn , is there an option to run a bot to update the dependencies?

That will require kubectl-rook-ceph plugin release which I'm not sure we are planning now but you can use kubectl-rook-ceph with commit that has your changes

@OdedViner
Copy link
Contributor Author

This PR depends on the kubectl-rook-ceph PR rook/kubectl-rook-ceph#354 that was merged today. @subhamkrai @travisn , is there an option to run a bot to update the dependencies?

That will require kubectl-rook-ceph plugin release which I'm not sure we are planning now but you can use kubectl-rook-ceph with commit that has your changes

done

Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks @OdedViner have you verified the changes are working as expected.

@OdedViner
Copy link
Contributor Author

OdedViner commented Feb 24, 2025

When attempting to restore it storagecluster CR, the process encountered an error while trying to remove finalizers from the storageclusters CR

1.Get storagecluster
$ oc get storagecluster -n openshift-storage 
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   21h   Ready              2025-02-23T18:25:32Z   4.18.0

2.Delete Storagecluster
$ oc delete storagecluster ocs-storagecluster -n openshift-storage 
storagecluster.ocs.openshift.io "ocs-storagecluster" deleted

3.Verify storagecluster in deleteing state
$ oc get storagecluster -n openshift-storage 
NAME                 AGE   PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   21h   Deleting              2025-02-23T18:25:32Z   4.18.0

4.Run Restore CLI 
$ oc get crd | grep storageclusters
storageclusters.ocs.openshift.io                                  2025-02-23T18:19:26Z

$ ./bin/odf restore deleted storageclusters.ocs.openshift.io
Info: Detecting which resources to restore for crd "storageclusters"
Info: Restoring CR ocs-storagecluster
Warning: The resource ocs-storagecluster was found deleted. Do you want to restore it? yes | no

yes
Info: Proceeding with restoring deleting CR
Info: Scaling down the operator
Info: Deleting validating webhook rook-ceph-webhook if present
Info: Removing ownerreferences from resources with matching uid ad51048e-cad3-47ce-88a7-c599cd14869a
Info: Removing owner references for secret onboarding-private-key
Info: Removed ownerReference for Secret: onboarding-private-key

Info: Removing owner references for secret onboarding-ticket-key
Info: Removed ownerReference for Secret: onboarding-ticket-key

Info: Removing owner references for configmaps rook-config-override
Info: Removed ownerReference for configmap: rook-config-override

Info: Removing owner references for service ocs-metrics-exporter
Info: Removed ownerReference for service: ocs-metrics-exporter

Info: Removing owner references for deployment ocs-metrics-exporter
Info: Removed ownerReference for deployment: ocs-metrics-exporter

Info: Removing owner references for deployment rook-ceph-tools
Info: Removed ownerReference for deployment: rook-ceph-tools

Info: Removing finalizers from storageclusters/ocs-storagecluster
Error: Failed to update resource "ocs-storagecluster" for crd. the server could not find the requested resource

I think I need to scale down the OCS operator instead of the Rook-Ceph operator.
https://github.com/rook/kubectl-rook-ceph/blob/a2321208dcaaee747dde004b9caf68ab57ce9b2b/pkg/restore/crd.go#L79
@subhamkrai @travisn, do we have a procedure for restoring the StorageCluster CR?

@travisn
Copy link
Contributor

travisn commented Feb 24, 2025

@subhamkrai @travisn, do we have a procedure for restoring the StorageCluster CR?

I don't believe the process has been documented before, thus important to test it out and confirm if it's working. Also, something doesn't look right in the execution, it shouldn't be removing all the owner references on so many resources. I thought that would only be needed for the cephcluster CR, not for the storageCluster CR.

@travisn
Copy link
Contributor

travisn commented Mar 28, 2025

@OdedViner How's the testing looking for this? It would be good to have this work completed soon.

@OdedViner OdedViner force-pushed the restore_ocs_cr branch 2 times, most recently from 7b394b3 to 7ae8755 Compare April 15, 2025 08:57
@OdedViner OdedViner force-pushed the restore_ocs_cr branch 2 times, most recently from c0c5284 to 21f039e Compare April 17, 2025 15:32
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squah your commits and did you tested the latest changes with odf cluster?

@OdedViner OdedViner force-pushed the restore_ocs_cr branch 2 times, most recently from daf3edd to a978ebc Compare April 29, 2025 13:43
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from these nits, changes looks good to me

}

// keys returns the keys of a string map. It is used to print out supported group names.
func keys(m map[string]string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func keys(m map[string]string) []string {
func groupNameKeys(m map[string]string) []string {

Let's have name related to their work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

resourceName = parts[0]
groupName = parts[1]

ver, ok := groupVersions[groupName]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ver, ok := groupVersions[groupName]
version, ok := groupVersions[groupName]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

newArgs[1] = args[1]
}
var customResources []pkgrestore.CustomResource
if contains(newArgs, "storageclusters") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if contains(newArgs, "storageclusters") {
if contains(newArgs, "storageclusters") {

I think we can use string.Slices here without https://go.dev/play/p/sj50jslDBUY to make code clean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@subhamkrai
Copy link
Contributor

The changes look good to me. Thanks

@travisn, ready to merge?
If I approve, the bot will merge, so I have not put lgtm tag

var groupVersions = map[string]string{
"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"csi.ceph.io": "v1beta1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4.19 we decided to keep this as v1alpha1, then in 4.20 it will change to v1.

Suggested change
"csi.ceph.io": "v1beta1",
"csi.ceph.io": "v1alpha1",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

go.sum Outdated
@@ -743,8 +743,8 @@ github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTE
github.com/rogpeppe/go-internal v1.8.1/go.mod h1:JeRgkft04UBgHMgCIwADu4Pn6Mtm5d4nPKWu0nJ5d+o=
github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8=
github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4=
github.com/rook/kubectl-rook-ceph v0.9.3 h1:+7THA8a+S2ArJrs9jpY1eJscAjPBKjlLXSmCVPU3eoY=
github.com/rook/kubectl-rook-ceph v0.9.3/go.mod h1:dOQ+Yccc41DxZqe9jpvAUHsYTquYP/SKClrPmG70SLM=
github.com/rook/kubectl-rook-ceph v0.9.4-0.20250428051344-dbfe77cc57a1 h1:znvPe0apxkTdkdiVINk0DfUbMGt6Vv+I9mgKfsr3odY=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @subhamkrai today that he will plan on v0.9.4 tomorrow, if you prefer to update this reference after that release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

This PR adds support for restoring deleted Custom Resources
for OCS operator via the ODF CLI. It includes logic for handling
storageclusters CRs by dynamically setting groupName
and versionResource, ensuring compatibility and reliability
during the restore process.

Signed-off-by: Oded Viner <[email protected]>
Copy link
Contributor

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it seems we just need to resolve the questions in https://issues.redhat.com/browse/DFBUGS-793 before merging.

Comment on lines +15 to +52
// groupVersions defines the supported CRD groups and their corresponding API versions.
var groupVersions = map[string]string{
"ocs.openshift.io": "v1",
"ceph.rook.io": "v1",
"csi.ceph.io": "v1alpha1",
"odf.openshift.io": "v1alpha1",
"noobaa.io": "v1alpha1",
"csiaddons.openshift.io": "v1alpha1",
}

// groupNameKeys returns the keys of a string map. It is used to print out supported group names.
func groupNameKeys(m map[string]string) []string {
out := make([]string, 0, len(m))
for k := range m {
out = append(out, k)
}
return out
}

// parseFullyQualifiedCRD takes a fully qualified CRD type of the form "resource.group"
// (for example, "cephclusters.ceph.rook.io") and returns the resource name, group name, and
// the API version associated with that group. It returns an error if the format is invalid
// or the group is not recognized.
func parseFullyQualifiedCRD(fqcrd string) (resourceName, groupName, version string, err error) {
parts := strings.SplitN(fqcrd, ".", 2)
if len(parts) != 2 {
return "", "", "", fmt.Errorf("invalid CRD format %q; expected format <resource>.<group>", fqcrd)
}
resourceName = parts[0]
groupName = parts[1]

version, ok := groupVersions[groupName]
if !ok {
return "", "", "", fmt.Errorf("unsupported group %q; supported groups are: %v", groupName, groupNameKeys(groupVersions))
}
return resourceName, groupName, version, nil
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason in not using the established GVK/GVR schema from https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/schema/group_version.go or https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/group_version.go if we want a higher level of abstraction.

is it like you don't want to import Go Structs from other repos? as of now iiuc, we already made an effort moving APIs into a submodule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leelavg The input format is ./bin/odf restore deleted storageclusters.ocs.openshift.io, where only the CR name and group are provided (storageclusters.ocs.openshift.io). We do not specify the version. Please add the relevant code to improve the current implementation.

if slices.Contains(newArgs, "storageclusters") {
customResources = []pkgrestore.CustomResource{
// ceph.rook.io/v1
{Group: "ceph.rook.io", Version: "v1", Resource: "cephblockpoolradosnamespaces"},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we create a link back to the ocs-op to keep this list updated when a new resource is owned by storagecluster?

did you check that k8s primitives (secrets, configmaps etc) that are owned by storagecluster are ok to be deleted?

apart from ensuring pods are running what other checks are performed after restoration of storagecluster CR?

iirc, storagecluster UID is being used in DR related activities as well, maybe that also gets effected or we need to have a statement of not performing this on DR enabled setups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leelavg I have not tested it on a DR setup or in Provider mode.

@leelavg
Copy link

leelavg commented May 7, 2025

just dropping it here as I didn't look into the plugins that we were developing upto now (pardon), usually https://github.com/kubernetes/cli-runtime is preferred as a base when developing kubectl plugins as it abstracts api-resources, discovery if required, generic printers etc.

@OdedViner
Copy link
Contributor Author

just dropping it here as I didn't look into the plugins that we were developing upto now (pardon), usually https://github.com/kubernetes/cli-runtime is preferred as a base when developing kubectl plugins as it abstracts api-resources, discovery if required, generic printers etc.

@leelavg This is a downstream project, so I believe it would be more relevant for the upstream project: https://github.com/rook/kubectl-rook-ceph.

@subhamkrai
Copy link
Contributor

We also need to scale down odf-operator when restoring storageCluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants