Skip to content

[BUG] IPs are not released when a node is gracefully shut down #550

@sprat

Description

@sprat

Describe the bug

IPs are not released immediately when a node is gracefully shut down but only after the node has rebooted.

Expected behavior

IPs should be released immediately so that the new pods spawned to replace the killed pods can acquire IP addresses.

To Reproduce

  1. I have enabled the graceful node shutdown feature (as described in https://kubernetes.io/docs/concepts/cluster-administration/node-shutdown) with the following kubelet parameters:
  • shutdownGracePeriod: 60s
  • shutdownGracePeriodCriticalPods: 20s
  1. Then I trigger a node shutdown by launching the reboot command on the node.

With the graceful shutdown feature, the pods on the nodes are killed (they end-up in Completed state) but not deleted by Kubernetes: I guess that's why the IP addresses are not released. I've voluntarily limited the number of IP addresses in the pool to demonstrate the problem: the new pods can't acquire IP addresses.

Environment:

  • Whereabouts version : v0.8.0
  • Kubernetes version (use kubectl version): v1.30.2
  • Network-attachment-definition:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-sriov
  labels:
    app: test-sriov
spec:
  replicas: 8
  selector:
    matchLabels:
      app: test-sriov
  template:
    metadata:
      labels:
        app: test-sriov
      annotations:
        k8s.v1.cni.cncf.io/networks: sriov-net-1
    spec:
      containers:
      - name: main
        image: nginx:latest
        resources:
          requests:
            intel.com/sriov_net_1: "1"
          limits:
            intel.com/sriov_net_1: "1"
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net-1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: "intel.com/sriov_net_1"
spec:
  config: '{
    "cniVersion": "0.3.1",
    "name": "sriov_net_1",
    "type": "sriov",
    "spoofchk": "off",
    "trust": "on",
    "ipam": {
      "type": "whereabouts",
      "range": "172.29.144.0/24",
      "range_start": "172.29.144.200",
      "range_end": "172.29.144.208"
    }
  }'
  • Whereabouts configuration (on the host):
{
  "datastore": "kubernetes",
  "kubernetes": {
    "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
  },
  "reconciler_cron_expression": "30 4 * * *"
}
  • OS (e.g. from /etc/os-release): AlmaLinux 9.4
  • Kernel (e.g. uname -a): 5.14.0-427.13.1.el9_4.x86_64
  • Others: N/A

Additional info / context

Sometimes some other CNI pods (e.g. calico, multus) fail to restart immediately after reboot causing even more problems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions