Skip to content

"create PVC from snapshot" should not create fileset on Scale if "restore snapshot size" exceeds "GPFS filesystem Available size". #284

Open
@kulkarnicr

Description

@kulkarnicr

Describe the bug
GPFS filesystem was running out of space (400M available) and I tried creating a PVC from a snapshot (500M restore snapshot), then PVC remained in pending state.
Please note that, on Scale side, fileset was created and linked. However, mmapplypolicy cli command was continuously dumping RC=9 failure entries in /var/log/messages.

When I tried deleting this pending PVC, it went off from CSI side. However, on Scale side, fileset was not removed.
So, CSI side clean up is done, but leftover (fileset) remains on Scale side. One would be required to perform mmunlinkfileset and mmdelfileset on Scale to clean it up.

Is this expected behavior ?

Ideally create PVC from snapshot should immediately fail (without creating fileset on Scale) if restore snapshot size > remaining space on GPFS filesystem.

To Reproduce
Steps to reproduce the behavior:

  1. 400M remaining in the filesystem
[root@snivels4 2020_08_06-02:59:01 pvc-69bfe36b-d7a0-49db-be79-323942a180ac]$ df -h | grep smallfs
smallfs                5.0G  4.6G  432M  92% /ibm/smallfs
[root@snivels4 2020_08_06-02:59:03 pvc-69bfe36b-d7a0-49db-be79-323942a180ac]$
  1. try creating volume of 0.5G using snapshot
[root@snivels1 2020_08_06-02:59:23 20200728]$ knvs vs1-smallfs
NAME          READYTOUSE   SOURCEPVC      SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
vs1-smallfs   true         pvc1-smallfs                           512Mi         vsclass1        snapcontent-63827056-cbea-407f-8ec8-7436bd43e1ed   7h8m           8m30s
[root@snivels1 2020_08_06-02:59:24 20200728]$

[root@snivels1 2020_08_06-02:59:54 20200728]$ cat pvc2-from-restore-vs1.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc2-from-restore-vs1
spec:
  storageClassName: sc-indep-smallfs
  dataSource:
    name: vs1-smallfs
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 0.5Gi
[root@snivels1 2020_08_06-02:59:58 20200728]$

[root@snivels1 2020_08_06-03:00:00 20200728]$ knpvc pvc2-from-restore-vs1
Error from server (NotFound): persistentvolumeclaims "pvc2-from-restore-vs1" not found
[root@snivels1 2020_08_06-03:00:11 20200728]$

[root@snivels1 2020_08_06-03:00:12 20200728]$ kn apply -f pvc2-from-restore-vs1.yaml
persistentvolumeclaim/pvc2-from-restore-vs1 created
[root@snivels1 2020_08_06-03:00:21 20200728]$

[root@snivels1 2020_08_06-03:00:23 20200728]$ knpvc pvc2-from-restore-vs1
NAME                    STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS       AGE
pvc2-from-restore-vs1   Pending                                      sc-indep-smallfs   5s
[root@snivels1 2020_08_06-03:00:26 20200728]$
  1. Events for create PVC from snapshot
Events:
  Type     Reason                Age                   From                                                                                                 Message
  ----     ------                ----                  ----                                                                                                 -------
  Normal   ExternalProvisioning  4m5s (x42 over 14m)   persistentvolume-controller                                                                          waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator
  Normal   Provisioning          3m1s (x10 over 14m)   spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_9778c2a8-e858-44f2-98bc-404e1037ea41  External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/pvc2-from-restore-vs1"
  Warning  ProvisioningFailed    2m50s (x10 over 13m)  spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_9778c2a8-e858-44f2-98bc-404e1037ea41  failed to provision volume with StorageClass "sc-indep-smallfs": rpc error: code = Internal desc = failed to create volume from snapshot snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed: [[EFSSG0632C Command execution aborted.]]
[root@snivels1 2020_08_06-03:14:29 20200728]$
  1. Scale /var/log/messages error entry for mmapplypolicy
Aug  6 03:00:26 snivels4 mmfs[1602]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmcrfileset smallfs pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1024:1024 --allow-permission-change chmodAndSetAcl' RC=0
Aug  6 03:00:26 snivels4 mmfs[1841]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmlinkfileset smallfs pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 -J /ibm/smallfs/pvc-1dfa40b6-398b-4090-b3b8-d601223c9909' RC=0
Aug  6 03:00:28 snivels4 mmfs[2084]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmsetquota smallfs:pvc-1dfa40b6-398b-4090-b3b8-d601223c9909 --block 536870912:536870912' RC=0
...
Aug  6 03:00:37 snivels4 mmfs[3960]: REST-CLI root csiadmin [ENTRY, CHANGE] 'mmapplypolicy /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/.snapshots/snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed/pvc-69bfe36b-d7a0-49db-be79-323942a180ac-data -P /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/__tscpPolicy__1596708036 -N mount -B 100 -m 24'
Aug  6 03:00:38 snivels4 systemd: Started Session c95419 of user root.
Aug  6 03:00:40 snivels4 mmfs[4273]: REST-CLI root csiadmin [EXIT, CHANGE] 'mmapplypolicy /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/.snapshots/snapshot-63827056-cbea-407f-8ec8-7436bd43e1ed/pvc-69bfe36b-d7a0-49db-be79-323942a180ac-data -P /ibm/smallfs/pvc-69bfe36b-d7a0-49db-be79-323942a180ac/__tscpPolicy__1596708036 -N mount -B 100 -m 24' RC=9
  1. delete pvc (removed from CSI side, but fileset remains on Scale)
[root@snivels4 2020_08_06-03:22:34 smallfs]$ ls -ltrhai
total 267K
      107 dr-xr-xr-x 2 root root 8.0K Dec 31  1969 .snapshots
...
   524291 drwxrwx--x 3 root root 4.0K Aug  6 03:00 pvc-1dfa40b6-398b-4090-b3b8-d601223c9909
...
[root@snivels4 2020_08_06-03:22:36 smallfs]$


[root@snivels1 2020_08_06-03:22:43 20200728]$ knpvc pvc2-from-restore-vs1
NAME                    STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS       AGE
pvc2-from-restore-vs1   Pending                                      sc-indep-smallfs   22m
[root@snivels1 2020_08_06-03:22:45 20200728]$
[root@snivels1 2020_08_06-03:22:47 20200728]$ kndpvc pvc2-from-restore-vs1
persistentvolumeclaim "pvc2-from-restore-vs1" deleted
[root@snivels1 2020_08_06-03:22:58 20200728]$
[root@snivels1 2020_08_06-03:23:00 20200728]$ knpvc pvc2-from-restore-vs1
Error from server (NotFound): persistentvolumeclaims "pvc2-from-restore-vs1" not found
[root@snivels1 2020_08_06-03:23:01 20200728]$

[root@snivels4 2020_08_06-03:23:39 smallfs]$ ls -ltrhai
total 267K
      107 dr-xr-xr-x 2 root root 8.0K Dec 31  1969 .snapshots
...
   524291 drwxrwx--x 3 root root 4.0K Aug  6 03:00 pvc-1dfa40b6-398b-4090-b3b8-d601223c9909
...
[root@snivels4 2020_08_06-03:23:41 smallfs]$

Expected behavior
Ideally create PVC from snapshot should immediately fail (without creating fileset on Scale) if restore snapshot size > remaining space on GPFS filesystem.

Environment
Please run the following an paste your output here:

# Developement
operator-sdk version 
go version

# Deployment
kubectl version
rpm -qa | grep gpfs

[root@snivels1 2020_08_06-04:16:08 20200728]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
[root@snivels1 2020_08_06-04:16:09 20200728]$

[root@snivels4 2020_08_06-04:16:25 smallfs]$ rpm -qa | grep gpfs
gpfs.msg.en_US-5.0.5-2.noarch
gpfs.java-5.0.5-2.x86_64
gpfs.license.adv-5.0.5-2.x86_64
gpfs.gss.pmcollector-5.0.5-2.el7.x86_64
gpfs.gpl-5.0.5-2.noarch
gpfs.docs-5.0.5-2.noarch
gpfs.compression-5.0.5-2.x86_64
gpfs.callhome-ecc-client-5.0.5-2.noarch
gpfs.crypto-5.0.5-2.x86_64
gpfs.gss.pmsensors-5.0.5-2.el7.x86_64
gpfs.gui-5.0.5-2.noarch
gpfs.librdkafka-5.0.5-2.x86_64
gpfs.base-5.0.5-2.x86_64
gpfs.gskit-8.0.55-12.x86_64
gpfs.adv-5.0.5-2.x86_64
gpfs.kafka-5.0.5-2.x86_64
[root@snivels4 2020_08_06-04:16:30 smallfs]$

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Note: My snapshot had 2 files (file1 of 8 bytes, file2 of 470M). file1 was copied with data to PVC fileset and file2 copied but without data i.e. 0 size. This may be because of space crunch.

images used

 quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator
 quay.io/k8scsi/csi-node-driver-registrar
 quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver
 quay.io/k8scsi/csi-snapshotter
 quay.io/k8scsi/snapshot-controller

Metadata

Metadata

Assignees

No one assigned

    Labels

    Component: SnapshotCustomer Impact: Localized low impact(2) Temporary / limited perf impact, unnecessary failovers, issues occur while in degraded stateCustomer Probability: Low(1) Issue only occurs during failure condition - disk, server, network, test assert, ...Severity: 3Indicates the the issue is on the priority list for next milestone.Type: BugIndicates issue is an undesired behavior, usually caused by code error.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions