Skip to content

Add pod collision detection#612

Open
RamLavi wants to merge 12 commits intok8snetworkplumbingwg:mainfrom
RamLavi:add_pod_collision_detection
Open

Add pod collision detection#612
RamLavi wants to merge 12 commits intok8snetworkplumbingwg:mainfrom
RamLavi:add_pod_collision_detection

Conversation

@RamLavi
Copy link
Copy Markdown
Member

@RamLavi RamLavi commented Mar 29, 2026

This PR introduces a pod MAC collision detection controller, that will replace the current pod MAC collision rejection webhook behavior.
It introduces a controller for pod collisions, and also handles cross-pod-vmi coliision cases.
In order to keep consistency between the cross-collision objects, a secondary watch is added, along with e2e that cover all above scenarios.

What this PR does / why we need it:
last commit belongs to #611

Special notes for your reviewer:

Release note:

Remove pod create/update rejection behavior when colliding MAC
Add pod MAC collision detection controller

@kubevirt-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign phoracek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RamLavi
Copy link
Copy Markdown
Member Author

RamLavi commented Mar 29, 2026

/hold

still running local tests to make sure we don't introduce any flakes

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request renames the vmicollision package to maccollision and introduces a new PodReconciler to detect MAC address collisions for Pods, including cross-type collisions between Pods and Virtual Machine Instances (VMIs). The PoolManager logic was updated to allow duplicate MAC allocations, delegating collision detection and reporting to the controller layer. Feedback includes removing an unused logger parameter and refactoring duplicated logic for finding running pods into a shared helper function to improve maintainability.

@RamLavi RamLavi force-pushed the add_pod_collision_detection branch from ed5d8ee to b2f79e3 Compare March 30, 2026 10:51
@RamLavi
Copy link
Copy Markdown
Member Author

RamLavi commented Mar 30, 2026

Change: Address gemini review and lint issues,

RamLavi added 10 commits March 30, 2026 14:07
Prepare the collision detection package for supporting Pod MAC collision
detection alongside the existing VMI support.

- Rename pkg/controller/vmicollision/ to pkg/controller/maccollision/
- Rename VMICollisionReconciler to VMIReconciler
- Update controller name, log names, and imports

This commit is a pure refactor with no logic changes.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
… check

Add the building blocks needed for Pod collision detection:
- IndexPodByMAC using vendored multus ParsePodNetworkAnnotation, with
  deduplication and normalization; returns nil when network-status
  annotation is absent (multus hasn't processed the Pod yet)
- StripPodForCollisionDetection cache transform keeping only metadata
  (Name, Namespace, UID, Labels, Annotations) and Status.Phase
- IsKubevirtOwned to identify virt-launcher Pods via the kubevirt.io
  label, matching kubevirt's own identification pattern

Register StripPodForCollisionDetection as a cache transform for Pods
in the manager. The existing Pod controller (pkg/controller/pod) reads
only metadata and annotations from the cache, which are preserved by
the transform.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Create new controller for detecting MAC collisions among running Pods:
- PodReconciler with Client and PoolManager
- SetupPodControllerWithManager() registers Pod MAC address indexer
- Basic Reconcile() skeleton with NotFound, virt-launcher, and
  namespace checks
- Placeholder for collision detection logic (to be implemented)

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Fill in PodReconciler with MAC collision detection: query field
index for colliding Pods, filter by managed namespaces and
non-kubevirt ownership, update collision map, and emit events
matching the VMI controller pattern.

Currently handles Pod-Pod collisions only; cross-type Pod-VMI
collision detection will be added in subsequent commits.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add podCollisionRelevantChanges() predicate filtering Pod events
by phase change, network-attachment annotation change, or
network-status annotation appearance. Build the controller with
source.Kind watch using this predicate, and register it via
add_podcollision.go init function.

Add PodReconciler unit tests covering early-return paths,
collision detection, event emission, and filtering of
virt-launcher / unmanaged pods.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Extend PodReconciler to also query the VMI MAC index when kubevirt
is enabled, detecting collisions between Pods and running VMIs.
Events are emitted on colliding Pods; VMI-side events are handled
by the VMI controller to avoid duplication.

Add IsKubevirtEnabled to PoolManagerInterface and cross-type unit
tests covering Pod-VMI detection, filtering, and kubevirt-disabled
path.

Extract shared VMI candidate helpers (listVMIsByMAC,
listRunningVMIsByMAC, listRunningVMIsByMACWithExcludeUID,
filterManagedVMIs) and use controller-specific filterVMIsWithMAC
wrappers.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Extend VMIReconciler to also query the Pod MAC index, detecting
collisions between VMIs and running Pods. Filter out virt-launcher
pods and unmanaged namespaces. Events emitted on colliding VMIs;
Pod-side events are handled by the Pod controller.

Use ObjectReference.Key() for type-prefixed collision messages
(vmi/ns/name, pod/ns/name). Add cross-type unit tests and update
existing VMI-VMI test expectations. Update E2E test helper to match
new message format.

Extract shared Pod candidate helpers (listRunningPodsWithMAC,
excludePodUID, filterCollisionCandidatePods) and use controller-specific
call paths.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
This commit adds cross-type secondary watch, using the
TypedEnqueueRequestsFromMapFunc feature [0] on controller-runtime.

Without these secondary watches, each controller only reconciles when
its own resource type changes. This means that when a Pod is deleted,
the VMI controller retains stale collision data for the VMIs that
previously collided with that Pod, until the VMI happens to be
reconciled for some other reason (and vice-versa for VMI deletions
affecting the Pod controller).

With this change, each controller now adds a secondary watch on the
other resource type using handler.EnqueueRequestsFromMapFunc. When a Pod
changes, the VMI controller's handler extracts the Pod's MACs and
enqueues VMIs sharing those MACs for re-evaluation. The Pod controller
does the same in reverse for VMI changes, conditioned on Kubevirt being
enabled.

[0]
https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/handler#TypedEnqueueRequestsFromMapFunc

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
The pod webhook previously rejected pod creation when a requested MAC
was already allocated to another pod in the pool map. With the MAC
collision detection controller now in place, this blocking is no longer
needed — collisions are surfaced via Kubernetes events and metrics.

Simplify allocatePodRequestedMac to match the VM webhook's behavior
(allocateRequestedVirtualMachineInterfaceMac): just register the MAC
via createOrUpdateEntry without any collision check. Remove the now
unused macAlreadyBelongsToPodAndNetwork function.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
The VMI collision and migration test AfterEach blocks wait for VMI
objects to be deleted but not for the virt-launcher pods they spawned
to terminate. This leaves pods in test namespaces that can cause
subsequent tests to fail with "There should be no Pods" assertions.

Add pod termination waits in both the VMI collision detection AfterEach
and the VMI migration test AfterEach.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the add_pod_collision_detection branch from b2f79e3 to af00f3d Compare March 30, 2026 11:07
@RamLavi
Copy link
Copy Markdown
Member Author

RamLavi commented Mar 30, 2026

Change: Refactor commit message

@RamLavi
Copy link
Copy Markdown
Member Author

RamLavi commented Mar 30, 2026

/hold
until #611 is merged

@RamLavi
Copy link
Copy Markdown
Member Author

RamLavi commented Mar 31, 2026

/cherry-pick release-0.51

@kubevirt-bot
Copy link
Copy Markdown
Collaborator

@RamLavi: once the present PR merges, I will cherry-pick it on top of release-0.51 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-0.51

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

RamLavi added a commit to RamLavi/monitoring that referenced this pull request Mar 31, 2026
KubeMacPool now detects MAC collisions on Pods with Multus secondary
interfaces (in addition to VMIs) [0]. Add a diagnosis step showing how
to find colliding Pods by querying the network-status annotation.

Also refactor step 3 to use MAC env var.

[0] k8snetworkplumbingwg/kubemacpool#612

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Add E2E tests for Pod-Pod and cross-type Pod-VMI MAC collision
detection, mirroring the existing VMI collision test structure.

Pod-Pod tests cover: inter-Pod collisions (with MAC format
variations), out-of-range MACs, 3+ Pods, partial collisions on
multi-interface Pods, multiple colliding MACs, and collision
clearing on Pod deletion.

Cross-type tests cover: detecting collisions between a Pod and a
VMI sharing the same MAC, and verifying collision clears when
either the Pod or the VMI is deleted.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Made-with: Cursor
The certificate recovery tests (certificates_test.go) verify that a
deleted secret/caBundle gets regenerated, but do not wait for the
webhook server to actually reload and serve the new certificate.
Due to kubelet secret volume propagation delay (~60s), subsequent
tests can hit "x509: certificate signed by unknown authority" if
they call the webhook before the new cert is in place.

Add a probe in checkCertLibraryRecovery that creates a VM via the
webhook and retries until it succeeds, ensuring the new certificate
is fully operational before the test completes.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the add_pod_collision_detection branch from af00f3d to c55432d Compare April 5, 2026 11:21
@RamLavi
Copy link
Copy Markdown
Member Author

RamLavi commented Apr 5, 2026

Change: use dedicated opt in function to avoid opt in/out race flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants