Skip to content

Commit d3070d1

Browse files
oilbeaterclaude
andauthored
fix(controller): clean orphaned attachment IPs on KubeVirt NAD hotplug (#6519)
* fix(controller): clean orphaned attachment IPs on KubeVirt NAD hotplug When KubeVirt VEP #140 (LiveUpdateNADRef) changes a VM's secondary network NAD reference, the old attachment IP CR and OVN LSP were never released because keepIPCR=true (VMI still alive). This caused IP leaks and orphaned OVN logical switch ports. Add getVMOrphanedAttachmentPorts() to detect attachment networks on a VM pod that are no longer in the VM's spec.template.spec.networks. In the keepIPCR=true branch of handleDeletePod, selectively delete orphaned attachment LSPs and release their IPs, while preserving the shared primary network LSP and other unchanged attachment IPs. Only vm.Spec.Template.Spec.Networks is used as the authoritative source (not template annotations) because VEP #140 modifies spec.networks and template annotations may not be synced. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * fix(controller): also clean stale attachment IPs on new VM pod creation Address review feedback: the orphaned attachment cleanup in handleDeletePod only works when the NAD change happens before pod deletion. In the stop→patch NAD→start workflow, the old pod deletion is processed before the NAD patch, so stale IPs are missed. Add cleanStaleVMAttachmentIPs() in reconcileAllocateSubnets to detect and remove attachment LSPs/IPs that belong to the VM but are not part of the current pod's networks. This ensures stale resources are cleaned regardless of operation ordering. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * fix(controller): address review feedback on stale IP cleanup robustness 1. cleanStaleVMAttachmentIPs now calls getPodKubeovnNets internally (after pod re-fetch) instead of relying on needAllocatePodNets, which only contained nets needing allocation and would incorrectly flag already-allocated nets as stale. 2. In keepIPCR=true branch, iterate vmOrphanedPorts directly and get subnet name from IP CR spec instead of depending on podNets, which could be nil if getPodKubeovnNets fails. 3. Use ipClient.WaitToBeReady for new attachment IP assertion to avoid timing-dependent failures. 4. Add ExpectNotEmpty guard before primary IP comparison. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * fix(controller): skip IP cleanup when stale LSP deletion fails If DeleteLogicalSwitchPort fails in cleanStaleVMAttachmentIPs, skip the corresponding IP CR deletion and IPAM release to avoid IP conflicts where an address is reused while its OVN port still exists. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * fix(controller): use map for orphaned ports lookup and add Patch retry - Change vmOrphanedPorts from []string to map[string]bool for O(1) lookup instead of O(n) slices.Contains in the port cleanup loop. - Add PollUntilContextTimeout retry to VMClient.Patch following the same pattern as other framework clients (handleWaitingAPIError). Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * ci: install multus before kubevirt e2e tests The new kubevirt attachment network e2e tests require Multus CRD (NetworkAttachmentDefinition) to be available. Add kind-install-multus before kind-install-kubevirt in CI workflows. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * fix(controller): compare OVN ports against VM spec for orphan detection Refactor getVMOrphanedAttachmentPorts to compare OVN's actual existing ports against the VM spec's desired ports, instead of reading stale pod annotations via getPodAttachmentNet. The pod being deleted may not have accurate NAD info, while OVN DB is the source of truth for what ports actually exist. The function now: - Takes the existing OVN ports list (already fetched in handleDeletePod) - Builds expected ports from vm.Spec.Template.Spec.Networks (same pattern as gc.go getVMLsps) - Returns ports in OVN but not in the expected set as orphaned Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> * refactor(controller): simplify orphaned port cleanup code - Merge two loops in handleDeletePod keepIPCR branch into one: LSP deletion and IP cleanup now happen in the same iteration over ports. - Combine two iterations over VM spec.networks in getVMOrphanedAttachmentPorts into a single pass. - Add early exit in cleanStaleVMAttachmentIPs when OVN has no ports, avoiding expensive getPodKubeovnNets call. - Reorder cleanStaleVMAttachmentIPs to query OVN first (cheap) before pod network parsing (expensive). - Use ipCR.Spec.Subnet consistently instead of port.ExternalIDs["ls"] for subnet name in cleanStaleVMAttachmentIPs. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> --------- Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0e0fa7d commit d3070d1

File tree

5 files changed

+378
-5
lines changed

5 files changed

+378
-5
lines changed

.github/workflows/build-x86-image.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1154,6 +1154,7 @@ jobs:
11541154
E2E_NETWORK_MODE: ${{ matrix.mode }}
11551155
run: |
11561156
make kube-ovn-conformance-e2e
1157+
make kind-install-multus
11571158
make kind-install-kubevirt
11581159
make kube-ovn-kubevirt-e2e
11591160

.github/workflows/scheduled-e2e.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -754,6 +754,7 @@ jobs:
754754
version=$(grep -E '^VERSION="v([0-9]+\.){2}[0-9]+"$' dist/images/install.sh | head -n1 | awk -F= '{print $2}' | tr -d '"')
755755
docker pull kubeovn/kube-ovn:$version
756756
VERSION=$version make kind-install
757+
VERSION=$version make kind-install-multus
757758
VERSION=$version make kind-install-kubevirt
758759
759760
- name: Run E2E

pkg/controller/pod.go

Lines changed: 172 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -728,6 +728,14 @@ func (c *Controller) reconcileAllocateSubnets(pod *v1.Pod, needAllocatePodNets [
728728
return nil, err
729729
}
730730

731+
// Clean stale attachment IPs/LSPs from previous NAD references when a new
732+
// VM pod starts. This handles the stop→patch NAD→start workflow where the
733+
// old pod deletion was processed before the NAD change.
734+
// Called after pod re-fetch so getPodKubeovnNets sees current annotations.
735+
if isVMPod && c.config.EnableKeepVMIP {
736+
c.cleanStaleVMAttachmentIPs(pod, podName)
737+
}
738+
731739
// Check if pod is a vpc nat gateway. Annotation set will have subnet provider name as prefix
732740
isVpcNatGw, vpcGwName := c.checkIsPodVpcNatGw(pod)
733741
if isVpcNatGw {
@@ -1049,6 +1057,7 @@ func (c *Controller) handleDeletePod(key string) (err error) {
10491057

10501058
var keepIPCR, isOwnerRefToDel, isOwnerRefDeleted bool
10511059
var ipcrToDelete []string
1060+
var vmOrphanedPorts map[string]bool
10521061
isStsPod, stsName, stsUID := isStatefulSetPod(pod)
10531062
if isStsPod {
10541063
if !pod.DeletionTimestamp.IsZero() {
@@ -1105,19 +1114,54 @@ func (c *Controller) handleDeletePod(key string) (err error) {
11051114
keepIPCR = false
11061115
}
11071116
}
1117+
// Detect orphaned attachment ports from NAD hotplug (KubeVirt VEP #140).
1118+
// Compare OVN's actual ports against VM spec's desired ports.
1119+
// These are cleaned selectively in the keepIPCR=true branch to avoid
1120+
// deleting shared primary network LSPs during live migration.
1121+
if keepIPCR {
1122+
vmOrphanedPorts = c.getVMOrphanedAttachmentPorts(pod.Namespace, vmName, ports)
1123+
}
11081124
}
11091125

11101126
podNets, err := c.getPodKubeovnNets(pod)
11111127
if err != nil {
11121128
klog.Errorf("failed to get kube-ovn nets of pod %s: %v", podKey, err)
11131129
}
11141130
if keepIPCR {
1115-
// always remove lsp from port groups
11161131
for _, port := range ports {
1117-
klog.Infof("remove lsp %s from all port groups", port.Name)
1118-
if err = c.OVNNbClient.RemovePortFromPortGroups(port.Name); err != nil {
1119-
klog.Errorf("failed to remove lsp %s from all port groups: %v", port.Name, err)
1120-
return err
1132+
if vmOrphanedPorts[port.Name] {
1133+
// Orphaned attachment LSP from NAD hotplug: delete and release its IP.
1134+
klog.Infof("delete orphaned vm attachment lsp %s", port.Name)
1135+
if err := c.OVNNbClient.DeleteLogicalSwitchPort(port.Name); err != nil {
1136+
klog.Errorf("failed to delete orphaned lsp %s: %v", port.Name, err)
1137+
return err
1138+
}
1139+
ipCR, err := c.ipsLister.Get(port.Name)
1140+
if err != nil {
1141+
if !k8serrors.IsNotFound(err) {
1142+
klog.Errorf("failed to get ip %s: %v", port.Name, err)
1143+
}
1144+
continue
1145+
}
1146+
if ipCR.Labels[util.IPReservedLabel] != "true" {
1147+
klog.Infof("delete orphaned vm attachment ip CR %s", ipCR.Name)
1148+
if err := c.config.KubeOvnClient.KubeovnV1().IPs().Delete(context.Background(), ipCR.Name, metav1.DeleteOptions{}); err != nil {
1149+
if !k8serrors.IsNotFound(err) {
1150+
klog.Errorf("failed to delete ip %s: %v", ipCR.Name, err)
1151+
return err
1152+
}
1153+
}
1154+
if subnetName := ipCR.Spec.Subnet; subnetName != "" {
1155+
c.ipam.ReleaseAddressByNic(podKey, port.Name, subnetName)
1156+
c.updateSubnetStatusQueue.Add(subnetName)
1157+
}
1158+
}
1159+
} else {
1160+
klog.Infof("remove lsp %s from all port groups", port.Name)
1161+
if err = c.OVNNbClient.RemovePortFromPortGroups(port.Name); err != nil {
1162+
klog.Errorf("failed to remove lsp %s from all port groups: %v", port.Name, err)
1163+
return err
1164+
}
11211165
}
11221166
}
11231167
} else {
@@ -2407,6 +2451,129 @@ func shouldCleanPodNet(c *Controller, pod *v1.Pod, ownerRefName, ownerRefSubnet,
24072451
return false
24082452
}
24092453

2454+
// getVMOrphanedAttachmentPorts finds OVN ports that belong to the VM but are
2455+
// no longer desired by the VM's current spec (e.g., KubeVirt NAD hotplug).
2456+
// It compares OVN's actual ports (existingPorts) against the VM spec's desired
2457+
// ports, rather than relying on pod annotations which may be stale or incomplete.
2458+
func (c *Controller) getVMOrphanedAttachmentPorts(namespace, vmName string, existingPorts []ovnnb.LogicalSwitchPort) map[string]bool {
2459+
vm, err := c.config.KubevirtClient.VirtualMachine(namespace).Get(context.Background(), vmName, metav1.GetOptions{})
2460+
if err != nil {
2461+
klog.Errorf("failed to get vm %s/%s for orphaned port detection: %v", namespace, vmName, err)
2462+
return nil
2463+
}
2464+
2465+
if vm.Spec.Template == nil {
2466+
return nil
2467+
}
2468+
2469+
// Build expected port names from VM spec in a single pass (pattern from gc.go:1108-1145)
2470+
expectedPorts := make(map[string]bool)
2471+
defaultMultus := false
2472+
hasMultusNetwork := false
2473+
for _, network := range vm.Spec.Template.Spec.Networks {
2474+
if network.Multus == nil {
2475+
continue
2476+
}
2477+
if network.Multus.Default {
2478+
defaultMultus = true
2479+
}
2480+
if network.Multus.NetworkName != "" {
2481+
hasMultusNetwork = true
2482+
items := strings.Split(network.Multus.NetworkName, "/")
2483+
if len(items) != 2 {
2484+
items = []string{namespace, items[0]}
2485+
}
2486+
provider := fmt.Sprintf("%s.%s.%s", items[1], items[0], util.OvnProvider)
2487+
expectedPorts[ovs.PodNameToPortName(vmName, namespace, provider)] = true
2488+
}
2489+
}
2490+
if !defaultMultus {
2491+
expectedPorts[ovs.PodNameToPortName(vmName, namespace, util.OvnProvider)] = true
2492+
}
2493+
2494+
// If VM spec has no multus networks, skip detection to avoid
2495+
// false positives for VMs that only use template annotations.
2496+
if !hasMultusNetwork {
2497+
return nil
2498+
}
2499+
2500+
// Find orphaned: ports in OVN but not in expected
2501+
orphanedPorts := make(map[string]bool)
2502+
for _, port := range existingPorts {
2503+
if !expectedPorts[port.Name] {
2504+
klog.Infof("OVN port %s for vm %s/%s is not in VM spec, marking as orphaned",
2505+
port.Name, namespace, vmName)
2506+
orphanedPorts[port.Name] = true
2507+
}
2508+
}
2509+
2510+
if len(orphanedPorts) == 0 {
2511+
return nil
2512+
}
2513+
return orphanedPorts
2514+
}
2515+
2516+
// cleanStaleVMAttachmentIPs removes IP CRs and OVN LSPs that belong to
2517+
// this VM but are not part of the current pod's networks. This handles
2518+
// the stop→patch NAD→start workflow where the old pod deletion was
2519+
// processed before the NAD change, leaving stale attachment resources.
2520+
func (c *Controller) cleanStaleVMAttachmentIPs(pod *v1.Pod, podName string) {
2521+
podKey := fmt.Sprintf("%s/%s", pod.Namespace, podName)
2522+
2523+
// List existing LSPs first (cheap OVN query) to bail out early if none exist
2524+
ports, err := c.OVNNbClient.ListNormalLogicalSwitchPorts(true, map[string]string{"pod": podKey})
2525+
if err != nil {
2526+
klog.Errorf("failed to list lsps of vm %s for stale cleanup: %v", podKey, err)
2527+
return
2528+
}
2529+
if len(ports) == 0 {
2530+
return
2531+
}
2532+
2533+
// Build current port names from the pod's full network list
2534+
podNets, err := c.getPodKubeovnNets(pod)
2535+
if err != nil {
2536+
klog.Errorf("failed to get kube-ovn nets of pod %s for stale cleanup: %v", podKey, err)
2537+
return
2538+
}
2539+
currentPorts := make(map[string]bool, len(podNets)+1)
2540+
for _, podNet := range podNets {
2541+
currentPorts[ovs.PodNameToPortName(podName, pod.Namespace, podNet.ProviderName)] = true
2542+
}
2543+
currentPorts[ovs.PodNameToPortName(podName, pod.Namespace, util.OvnProvider)] = true
2544+
2545+
for _, port := range ports {
2546+
if currentPorts[port.Name] {
2547+
continue
2548+
}
2549+
klog.Infof("cleaning stale vm attachment lsp %s (not in current pod networks)", port.Name)
2550+
if err := c.OVNNbClient.DeleteLogicalSwitchPort(port.Name); err != nil {
2551+
klog.Errorf("failed to delete stale lsp %s, skipping IP cleanup to avoid inconsistency: %v", port.Name, err)
2552+
continue
2553+
}
2554+
2555+
ipCR, err := c.ipsLister.Get(port.Name)
2556+
if err != nil {
2557+
if !k8serrors.IsNotFound(err) {
2558+
klog.Errorf("failed to get ip %s: %v", port.Name, err)
2559+
}
2560+
continue
2561+
}
2562+
if ipCR.Labels[util.IPReservedLabel] != "true" {
2563+
klog.Infof("deleting stale vm attachment ip CR %s", ipCR.Name)
2564+
if err := c.config.KubeOvnClient.KubeovnV1().IPs().Delete(context.Background(), ipCR.Name, metav1.DeleteOptions{}); err != nil {
2565+
if !k8serrors.IsNotFound(err) {
2566+
klog.Errorf("failed to delete ip %s: %v", ipCR.Name, err)
2567+
}
2568+
}
2569+
if subnetName := ipCR.Spec.Subnet; subnetName != "" {
2570+
c.ipam.ReleaseAddressByNic(podKey, port.Name, subnetName)
2571+
c.updateSubnetStatusQueue.Add(subnetName)
2572+
}
2573+
}
2574+
}
2575+
}
2576+
24102577
func isVMPod(pod *v1.Pod) (bool, string) {
24112578
for _, owner := range pod.OwnerReferences {
24122579
// The name of vmi is consistent with vm's name.

test/e2e/framework/virtual-machine.go

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ import (
99
apierrors "k8s.io/apimachinery/pkg/api/errors"
1010
"k8s.io/apimachinery/pkg/api/resource"
1111
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
12+
"k8s.io/apimachinery/pkg/types"
13+
"k8s.io/apimachinery/pkg/util/wait"
1214
k8sframework "k8s.io/kubernetes/test/e2e/framework"
1315
v1 "kubevirt.io/api/core/v1"
1416
"kubevirt.io/client-go/kubecli"
@@ -176,6 +178,46 @@ func (c *VMClient) WaitToDisappear(name string, _, timeout time.Duration) error
176178
return nil
177179
}
178180

181+
// Patch patches the vm with the given patch data, retrying on transient API errors.
182+
func (c *VMClient) Patch(name string, patchType types.PatchType, data []byte) *v1.VirtualMachine {
183+
ginkgo.GinkgoHelper()
184+
var patchedVM *v1.VirtualMachine
185+
err := wait.PollUntilContextTimeout(context.Background(), poll, timeout, true, func(ctx context.Context) (bool, error) {
186+
vm, err := c.VirtualMachineInterface.Patch(ctx, name, patchType, data, metav1.PatchOptions{})
187+
if err != nil {
188+
return handleWaitingAPIError(err, false, "patch vm %q", name)
189+
}
190+
patchedVM = vm
191+
return true, nil
192+
})
193+
ExpectNoError(err, "failed to patch vm %s", name)
194+
return patchedVM.DeepCopy()
195+
}
196+
197+
// MakeVMWithMultusNetwork creates a VM with an additional multus secondary network using bridge binding.
198+
func MakeVMWithMultusNetwork(name, image, size string, runStrategy *v1.VirtualMachineRunStrategy, multusNetworkName string) *v1.VirtualMachine {
199+
vm := MakeVM(name, image, size, runStrategy)
200+
vm.Spec.Template.Spec.Domain.Devices.Interfaces = append(
201+
vm.Spec.Template.Spec.Domain.Devices.Interfaces,
202+
v1.Interface{
203+
Name: "multus-net",
204+
InterfaceBindingMethod: v1.InterfaceBindingMethod{Bridge: &v1.InterfaceBridge{}},
205+
},
206+
)
207+
vm.Spec.Template.Spec.Networks = append(
208+
vm.Spec.Template.Spec.Networks,
209+
v1.Network{
210+
Name: "multus-net",
211+
NetworkSource: v1.NetworkSource{
212+
Multus: &v1.MultusNetwork{
213+
NetworkName: multusNetworkName,
214+
},
215+
},
216+
},
217+
)
218+
return vm
219+
}
220+
179221
func MakeVM(name, image, size string, runStrategy *v1.VirtualMachineRunStrategy) *v1.VirtualMachine {
180222
vm := &v1.VirtualMachine{
181223
ObjectMeta: metav1.ObjectMeta{

0 commit comments

Comments
 (0)