Skip to content

Virtual Nodes - Pod stuck in Pending state when referencing status.podIP from downward API #2427

Open
@jonstelly

Description

@jonstelly

What happened:
Deploying a pod to an AKS Virtual Node, the pod is stuck in the Pending state if a container has:

env:
  - name: SERVING_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

I see the following error in the aci linux connector pod:

time="2021-07-05T22:34:00Z" level=warning msg="requeuing \"default/acitest-wv7lc\" due to failed sync"
  error="failed to sync pod \"default/acitest-wv7lc\" in the provider: 
  unsupported fieldPath: status.podIP" key=default/acitest-wv7lc method=handleQueueItem node=virtual-node-aci-linux
  operatingSystem=Linux provider=azure watchedNamespace= workerId=1

What you expected to happen:
For the pod to start up. Below is a simple repro yaml for a job that displays the behavior. You can comment out the SERVING_POD_IP variable and it starts as expected. But the real use-case for this is Knative launching on the Virtual Nodes and they require that variable be set.

How to reproduce it (as minimally and precisely as possible):

run: kubectl apply -f ./acitest.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: acitest
spec:
  backoffLimit: 1
  template:
    spec:
      containers:
        - name: ubuntu
          image: ubuntu:latest
          command: ["uname", "-a"]
          resources:
            limits:
              cpu: 1
              memory: 32Mi
            requests:
              cpu: 1
              memory: 32Mi
          env:
            - name: SERVING_POD
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: SERVING_POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
      restartPolicy: Never
      nodeSelector:
        beta.kubernetes.io/os: linux
        kubernetes.io/role: agent
        type: virtual-kubelet   
      tolerations:
      - key: virtual-kubelet.io/provider
        operator: Exists
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoExecute
        tolerationSeconds: 300
      - key: node.kubernetes.io/unreachable
        operator: Exists
        effect: NoExecute
        tolerationSeconds: 300
      - key: node.kubernetes.io/memory-pressure
        operator: Exists
        effect: NoSchedule    

Anything else we need to know?:
Knative in front of Virtual Nodes seems like it would be a very popular/common model. I see another couple users with the same issue here and another issue in this repository that was auto-closed: #1139

Environment:

  • Kubernetes version (use kubectl version): 1.18.4 - 1.20.7
  • Size of cluster (how many worker nodes are in the cluster?): 3 node, Virtual Nodes enabled
  • General description of workloads in the cluster: Mixed dependencies + dotnet core components

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions