Description
Kube-OVN Version
v1.13.2
Kubernetes Version
v1.32.2
Operation-system/Kernel Version
6.11.0-17-generic
Description
Installing kube-ovn environments using helm charts causes issues in the context of subnet/pod deletion and pod IP allocation.
Installing and then uninstalling a kube-ovn environment using helm charts results in orphaned subnet resources (might be related to #4898). Subnets can only be deleted manually by removing the corresponding finalizers.
Additionally, dynamic IP allocation by deploying pods in a predefined namespace causes the PostStartHook to fail, as my PostStartHook modifies the routing table. The pod acquires an IP address of the ovn-default namespace, and not the one defined in the subnet-namespace it was deployed in, causing the route table modification to fail. Interestingly, if I assign the pod a static ip address within it's namespace IP pool, it works correctly. Furthermore, if I manually deploy the NetworkAttachmentDefinitions and subnets first using kubectl (instead of a combined helm chart), everything works correctly (static as well as dynamic IP allocation). If I then manually remove all pods first and then manually remove the subnets, no orphaned subnets remain.
As far as I know, helm collects all kinds defined in a helm chart and deploys them in a predefined, static sequence. This leads me to the suspicion that both issues described here are both symptoms of the same underlying problem: the CRDs might not be deployed in the expected sequence when using helm.
Steps To Reproduce
subnet.yaml
apiVersion: v1
kind: Namespace
metadata:
name: backend
---
apiVersion: v1
kind: Namespace
metadata:
name: r1pool
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
name: r1pool
spec:
protocol: IPv4
provider: r1pool.backend.ovn
cidrBlock: 10.1.0.0/16
# gateway: 10.1.0.1
excludeIps:
- 10.1.0.0..10.1.0.10
namespaces:
- r1pool
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: r1pool
namespace: backend
spec:
config: '{
"cniVersion": "0.3.0",
"type": "kube-ovn",
"server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
"provider": "r1pool.backend.ovn"
}'
---
host.yaml
# r1pool
apiVersion: apps/v1
kind: Deployment
metadata:
name: r1pool
namespace: r1pool
spec:
replicas: 1
selector:
matchLabels:
app: ippool
template:
metadata:
labels:
app: ippool
# If explicit IPs are assigned, everything works as expected using helm
# annotations:
# ovn.kubernetes.io/ip_pool: 10.1.1.100
spec:
containers:
- name: r1pool
image: docker.io/library/nginx:alpine
imagePullPolicy: IfNotPresent
securityContext:
privileged: true # Required for networking
capabilities:
add: ["NET_RAW", "NET_ADMIN"]
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "ip route del default && ip route add default via 10.1.1.254 dev eth0"]
---
Current Behavior
Given the following two manifests, everything works fine if they are deployed manually using kubectl. If hosts.yaml is removed first and subnet.yaml subsequently, everything works as expected.
If both are combined into a helm chart and the helm chart is installed, the pod receives an IP from the ovn-default subnet, not the subnet specified by the actual namespace r1pool. This causes the PostStartHook to fail. If an explicit IP is specified, the correct IP is acquired and the PostStartHook succeeds.
Uninstalling the helm chart causes the subnet deletion to fail, subnets have to be removed manually.
Expected Behavior
Pods acquire correct IP from the specified subnet without the need to specifiy an explicit IP address.