Skip to content

Mountpoint pods are sometimes unschedulable on Auto Mode #657

@simonfogliato

Description

@simonfogliato

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?

Using the aws-mountpoint-s3-csi-driver as an EKS managed add-on on an EKS Auto Mode cluster, the per-volume mountpoint pods stay stuck in Pending and never reach Running. As a result, S3-backed PVCs never finish mounting and application pods are stuck in ContainerCreating with FailedMount.

The workload pod events show:

MountVolume.SetUp failed for volume "ws47d38456-d132-11f0-9f88-333856a768da-d2029-pv" :
rpc error: code = Internal desc = Could not mount "workspace-precisely-actively-faithful-mayfly" at
"/var/lib/kubelet/pods/7e37581b-ab99-4e90-9992-4e5f6c720626/volumes/kubernetes.io~csi/ws47d38456-d132-11f0-9f88-333856a768da-d2029-pv/mount":
Failed to wait for Mountpoint Pod "mp-p2khv" to be ready: mppod/watcher: mountpoint pod not found.
Seems like Mountpoint Pod is not in 'Running' status.

Pod Event:

MountVolume.SetUp failed for volume "ws47d38456-d132-11f0-9f88-333856a768da-d2029-pv" : rpc error: code = Internal desc = Could not mount "workspace-precisely-actively-faithful-mayfly" at "/var/lib/kubelet/pods/7e37581b-ab99-4e90-9992-4e5f6c720626/volumes/kubernetes.io~csi/ws47d38456-d132-11f0-9f88-333856a768da-d2029-pv/mount": Failed to wait for Mountpoint Pod "mp-p2khv" to be ready: mppod/watcher: mountpoint pod not found. Seems like Mountpoint Pod is not in 'Running' status. You can see it's status and any potential failures by running: `kubectl describe pods -n mount-s3 mp-p2khv`

Pod Mount:

kubectl describe pods -n mount-s3 mp-p2khv
Name:                 mp-p2khv
Namespace:            mount-s3
Priority:             1000000000
Priority Class Name:  mount-s3-critical
Service Account:      default
Node:                 <none>
Labels:               s3.csi.aws.com/mounted-by-csi-driver-version=2.2.0
                      s3.csi.aws.com/mountpoint-version=1.21.0
Annotations:          s3.csi.aws.com/volume-id: ws47d38456-d132-11f0-9f88-333856a768da-d2029-s3
                      s3.csi.aws.com/volume-name: ws47d38456-d132-11f0-9f88-333856a768da-d2029-pv
Status:               Pending
IP:
IPs:                  <none>
Containers:
  mountpoint:
    Image:           602401143452.dkr.ecr.ca-central-1.amazonaws.com/eks/aws-s3-csi-driver:v2.2.0
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      /bin/aws-s3-csi-mounter
    Environment:  <none>
    Mounts:
      /comm from comm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcss8 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  comm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  10Mi
  kube-api-access-vcss8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    Optional:                false
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 op=Exists
Events:
  Type     Reason            Age                    From                   Message
  ----     ------            ----                   ----                   -------
  Warning  FailedScheduling  5m37s                  default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  5m21s (x2 over 5m21s)  default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  5m20s (x2 over 5m21s)  default-scheduler      0/11 nodes are available: 1 Too many pods, 10 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  4m56s (x2 over 5m2s)   default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  3m51s                  default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  3m51s (x2 over 3m51s)  default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  3m22s (x2 over 3m23s)  default-scheduler      0/9 nodes are available: 1 Too many pods, 8 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  3m1s (x2 over 3m2s)    default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  2m20s (x2 over 2m21s)  default-scheduler      0/11 nodes are available: 1 Too many pods, 10 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  78s (x2 over 102s)     default-scheduler      0/10 nodes are available: 1 Too many pods, 9 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  77s (x2 over 77s)      default-scheduler      0/9 nodes are available: 1 Too many pods, 8 node(s) didn't satisfy plugin(s) [NodeAffinity]. no new claims to deallocate, preemption: not eligible due to preemptionPolicy=Never.
  Warning  FailedScheduling  37s (x2 over 5m37s)    eks-auto-mode/compute  Failed to schedule pod, node selector term with matchFields is not supported

Workaround:

kubectl get pods | grep -v Running
NAME                                                              READY   STATUS              RESTARTS   AGE
ws47d38456-d132-11f0-9f88-333856a768da-d2029-0                    0/1     ContainerCreating   0          23m

kubectl delete pod ws47d38456-d132-11f0-9f88-333856a768da-d2029-0
pod "ws47d38456-d132-11f0-9f88-333856a768da-d2029-0" deleted from default namespace

kubectl get pods | grep -v Running
NAME                                                              READY   STATUS              RESTARTS   AGE
ws47d38456-d132-11f0-9f88-333856a768da-d2029-0                    0/1     ContainerCreating   0          2s

kubectl get pods | grep -v Running
NAME                                                              READY   STATUS    RESTARTS   AGE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions