-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Describe the bug: We use localpv with ext4 hard quotas. They work quite fine, but from time to time, we get the problem, that the quota has exceeded despite the folder contains less than the defined quota (10GiB). Today I could track the problem down to 2 PVCs that oviously had the same project quota ID set:
/nvme/disk# ls
lost+found pvc-2fabebc9-8143-4b60-beef-563180845e64 pvc-6d3a015a-c547-4292-9ed6-95b35a7aea41
/nvme/disk/pvc-6d3a015a-c547-4292-9ed6-95b35a7aea41# du -h --max-depth=1
4.2G ./workspace
33M ./remoting
8.0K ./caches
4.3G .
/nvme/disk# du -h --max-depth=1
6.1G ./pvc-2fabebc9-8143-4b60-beef-563180845e64
16K ./lost+found
4.3G ./pvc-6d3a015a-c547-4292-9ed6-95b35a7aea41
11G .
/nvme/disk# repquota -avugP
*** Report for project quotas on device /dev/md0
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
----------------------------------------------------------------------
#0 -- 20 0 0 2 0 0
#1 -- 0 10737419 10737419 0 0 0
#2 -- 0 10737419 10737419 0 0 0
#3 -- 0 10737419 10737419 0 0 0
#4 -- 10737416 10737419 10737419 6122 0 0
#5 -- 0 10737419 10737419 0 0 0
#6 -- 0 10737419 10737419 0 0 0
I think the problem occurs because of a race condition when determining the project id:
https://github.com/openebs/dynamic-localpv-provisioner/blob/e797585cb1e2c3578b914102bfe0e8768b04d950/cmd/provisioner-localpv/app/helper_hostpath.go#L294+L295
I see two possible workaround: either make sure that only one create-quota-pod can run at a time on one single node or apply a random project number instead of trying to increment them.
Expected behaviour: Each PVC has the quota it is configured with.
Steps to reproduce the bug:
Unfortunately, it is really hard to reproduce the bug, as it only happens now and then. During tests I scaled a deployment with a PVC up and down very fast to check the create and cleanup and had no problem. Maybe you can reproduce it with more than one deployment scaled up in parallel
The output of the following commands will help us better understand what's going on:
kubectl get pods -n <openebs_namespace> --show-labels
nvme-provisioner-localpv-provisioner-68f8494cf7-84hdv 1/1 Running 80 (12h ago) 32d app=localpv-provisioner,chart=localpv-provisioner-3.3.0,component=localpv-provisioner,heritage=Helm,name=openebs-localpv-provisioner,openebs.io/component-name=openebs-localpv-provisioner,openebs.io/version=3.3.0,pod-template-hash=68f8494cf7,release=nvme-provisioner
Anything else we need to know?:
The provisioner pod has lots of restarts, we don't know why, there is no error in the pod log, but it seems not to be related
Environment details:
- OpenEBS version (use
kubectl get po -n openebs --show-labels): 3.3.0 - Kubernetes version (use
kubectl version): 1.23.15 - Cloud provider or hardware configuration: AWS
- OS (e.g:
cat /etc/os-release): Amazon Linux 2 - kernel (e.g:
uname -a): 5.4.228-131.415.amzn2.x86_64
Metadata
Metadata
Assignees
Labels
Type
Projects
Status