-
Notifications
You must be signed in to change notification settings - Fork 511
Description
Description:
We are using the local-path provisioner to dynamically provision ReadWriteOnce volumes in a Kubernetes cluster. The StorageClass is configured with volumeBindingMode: WaitForFirstConsumer, which should delay volume binding until a pod is scheduled to a node.
However, we observe that the PersistentVolume (PV) is created and bound to a node before the pod is scheduled. This early binding results in scheduling issues when the node where the PV was bound does not have sufficient resources for the pod, even though other nodes do.
Since we are not using any node affinity, labels, or node selectors, the pod should be scheduled on any available node with sufficient resources. But due to the early PV binding, the pod is restricted to the node where the PV was prematurely bound — causing the pod to remain in a Pending state.
Steps to Reproduce:
Create a StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
Create a PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: local-storage
Create a pod using this PVC, with no nodeSelector or affinity:
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
-
name: test-container
image: busybox
command: ["sleep", "3600"]
volumeMounts:- name: test-vol
mountPath: /data
volumes:
- name: test-vol
-
name: test-vol
persistentVolumeClaim:
claimName: test-pvcEnsure that one node has low available CPU/memory and another has sufficient resources.
Expected Behavior:
The PVC should remain unbound until the pod is scheduled on a node with sufficient resources. The volume should then be dynamically provisioned on that node.
Actual Behavior:
The PVC gets bound early to a PV on a node with insufficient resources. Kubernetes then tries to schedule the pod on the same node (due to volume affinity), but scheduling fails due to lack of CPU/memory. The pod remains in Pending state even though other nodes have enough resources.
Impact:
This behavior leads to scheduling failures and resource wastage. Pods can't be scheduled even when the cluster has sufficient resources overall.
Workaround:
There is no clean workaround unless we switch to another provisioner or manually bind volumes — which defeats the purpose of dynamic provisioning.
Request:
Please confirm if this is a known bug or intended behavior. If unintended, can this be fixed to honor WaitForFirstConsumer binding logic correctly?
Thanks,
Prabhakaran M