-
Notifications
You must be signed in to change notification settings - Fork 142
Open
Description
Describe the bug
IPs are not released immediately when a node is gracefully shut down but only after the node has rebooted.
Expected behavior
IPs should be released immediately so that the new pods spawned to replace the killed pods can acquire IP addresses.
To Reproduce
- I have enabled the graceful node shutdown feature (as described in https://kubernetes.io/docs/concepts/cluster-administration/node-shutdown) with the following
kubeletparameters:
shutdownGracePeriod: 60sshutdownGracePeriodCriticalPods: 20s
- Then I trigger a node shutdown by launching the
rebootcommand on the node.
With the graceful shutdown feature, the pods on the nodes are killed (they end-up in Completed state) but not deleted by Kubernetes: I guess that's why the IP addresses are not released. I've voluntarily limited the number of IP addresses in the pool to demonstrate the problem: the new pods can't acquire IP addresses.
Environment:
- Whereabouts version : v0.8.0
- Kubernetes version (use
kubectl version): v1.30.2 - Network-attachment-definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-sriov
labels:
app: test-sriov
spec:
replicas: 8
selector:
matchLabels:
app: test-sriov
template:
metadata:
labels:
app: test-sriov
annotations:
k8s.v1.cni.cncf.io/networks: sriov-net-1
spec:
containers:
- name: main
image: nginx:latest
resources:
requests:
intel.com/sriov_net_1: "1"
limits:
intel.com/sriov_net_1: "1"
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net-1
annotations:
k8s.v1.cni.cncf.io/resourceName: "intel.com/sriov_net_1"
spec:
config: '{
"cniVersion": "0.3.1",
"name": "sriov_net_1",
"type": "sriov",
"spoofchk": "off",
"trust": "on",
"ipam": {
"type": "whereabouts",
"range": "172.29.144.0/24",
"range_start": "172.29.144.200",
"range_end": "172.29.144.208"
}
}'- Whereabouts configuration (on the host):
{
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"reconciler_cron_expression": "30 4 * * *"
}- OS (e.g. from /etc/os-release): AlmaLinux 9.4
- Kernel (e.g.
uname -a): 5.14.0-427.13.1.el9_4.x86_64 - Others: N/A
Additional info / context
Sometimes some other CNI pods (e.g. calico, multus) fail to restart immediately after reboot causing even more problems.
Metadata
Metadata
Assignees
Labels
No labels