Skip to content

Extend apinet-nic-plugin to watch and cleanup NICs#715

Merged
lukasfrank merged 5 commits intomainfrom
feat/watch-apinet-nics
Apr 28, 2026
Merged

Extend apinet-nic-plugin to watch and cleanup NICs#715
lukasfrank merged 5 commits intomainfrom
feat/watch-apinet-nics

Conversation

@lukasfrank
Copy link
Copy Markdown
Member

@lukasfrank lukasfrank commented Apr 27, 2026

Proposed Changes

  • Extend apinet-nic-plugin to watch and cleanup NICs
    • Stale NiCs will be deleted in case of rolling the hypervisor node
    • If a NIC is manually deleted, it will be recreated by controller

Fixes #714 #590

Summary by CodeRabbit

  • New Features

    • Added event-driven handling for network interface changes that triggers automatic machine reconciliation.
    • Added event handler registration for plugins and a not-ready signal for deferred NIC processing.
  • Refactor

    • Replaced polling with informer/watcher-driven NIC monitoring.
    • Simplified attachment/deletion logic to avoid tearing down desired NICs and to better handle no-op reconciliations.

Signed-off-by: Lukas Frank <lukas.frank@sap.com>
@lukasfrank lukasfrank force-pushed the feat/watch-apinet-nics branch from c0947f6 to b043eff Compare April 27, 2026 12:58
@friegger friegger added the integration-tests to run integration tests label Apr 27, 2026
@lukasfrank lukasfrank marked this pull request as ready for review April 27, 2026 14:15
@lukasfrank lukasfrank requested a review from a team as a code owner April 27, 2026 14:15
@lukasfrank lukasfrank self-assigned this Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@lukasfrank has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 37 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 972a8aed-2b37-4359-9367-d5a6d37e4ddf

📥 Commits

Reviewing files that changed from the base of the PR and between 619dfd6 and 62f55cb.

📒 Files selected for processing (4)
  • internal/plugins/networkinterface/apinet/apinet.go
  • internal/plugins/networkinterface/isolated/isolated.go
  • internal/plugins/networkinterface/plugins.go
  • internal/plugins/networkinterface/providernetwork/providernetwork.go
📝 Walkthrough

Walkthrough

Event-driven NIC lifecycle: plugin Init now accepts a context and supports event handler registration. APINet switched from polling to informer-driven watcher that notifies handlers on NIC status changes. Machine controller subscribes to NIC events and requeues machines; Apply/Delete semantics adjusted to surface not-ready state.

Changes

Cohort / File(s) Summary
CLI / App Init
cmd/libvirt-provider/app/app.go, internal/networkinterfaceplugin/apinet.go
Pass lifecycle ctx into plugin Init; removed polling flags and controller-runtime client construction in APINet wrapper.
Plugin API & Contracts
internal/plugins/networkinterface/plugins.go
Added ErrNotReady, EventHandler interface and EventHandlerFuncs; updated Plugin interface: Init(ctx, host), AddEventHandler, RemoveEventHandler.
APINet Plugin
internal/plugins/networkinterface/apinet/apinet.go
Replaced polling with informer watcher; NewPlugin now takes REST config; Init accepts ctx; added event handler registration; Apply returns ErrNotReady when host device absent; Delete simplified.
Other Plugin Implementations
internal/plugins/networkinterface/isolated/isolated.go, internal/plugins/networkinterface/providernetwork/providernetwork.go
Updated Init(ctx, host) signature; added no-op AddEventHandler/RemoveEventHandler.
Controller Integration & Tests
internal/controllers/machine_controller.go, internal/controllers/machine_controller_nics.go, internal/controllers/controllers_suite_test.go
MachineReconciler subscribes to NIC events and requeues machines on NIC changes; attach/detach logic treats nil-mounted NIC as no-op and skips teardown for desired NICs; tests updated to pass plugin context.

Sequence Diagram(s)

sequenceDiagram
    participant MC as Machine Controller
    participant Plugin as APINet Plugin
    participant Watcher as Informer Watcher
    participant API as API Server
    participant Handler as Event Handler

    MC->>Plugin: Init(ctx, host)
    Plugin->>Watcher: start(ctx)
    Watcher->>API: List NICs for node (initial cache)
    API-->>Watcher: return NIC list

    API->>Watcher: NIC update/delete event
    Watcher->>Plugin: notify on Status.State change or delete (machine-id present)
    Plugin->>Handler: call HandleNICEvent(machineID)
    Handler->>MC: HandleNICEventFunc(machineID)
    MC->>MC: queue.Add(machineID) (requeue machine)

    MC->>Plugin: Apply(host, nicSpec)
    alt host device absent
        Plugin-->>MC: ErrNotReady
        MC->>MC: defer attach (no mount)
    else host device present
        Plugin-->>MC: return mounted NIC
    end

    MC->>Plugin: Delete(nic)
    Plugin->>API: delete NIC (tolerate NotFound)
    Plugin->>Watcher: watcher will emit delete -> Handler -> MC requeue
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

enhancement, area/compute

Suggested reviewers

  • friegger
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: extending apinet-nic-plugin to watch and cleanup NICs, which aligns with the primary objectives of the PR.
Description check ✅ Passed The description covers the main changes, mentions fixing stale NIC deletion and recreation of manually deleted NICs, and references the linked issues.
Linked Issues check ✅ Passed The PR addresses all objectives from issue #714: handling external NIC deletion through watchers, cleaning up orphaned NICs during store loss scenarios, and maintaining consistent state updates.
Out of Scope Changes check ✅ Passed All changes are directly related to adding NIC watching and cleanup functionality; updates to Plugin interface and implementations across all plugins are necessary to support the new event-driven architecture.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/watch-apinet-nics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/plugins/networkinterface/apinet/apinet.go`:
- Around line 384-401: The handleNICDelete function currently returns early if
obj is not an *apinetv1alpha1.NetworkInterface, which skips handling tombstones;
update handleNICDelete to detect and unwrap toolscache.DeletedFinalStateUnknown
(the tombstone wrapper), extract the inner Obj, cast that to
*apinetv1alpha1.NetworkInterface and proceed exactly as for a normal delete
(checking nic.Spec.NodeRef.Name vs p.nodeName, reading labelMachineID, logging
and calling p.notifyEventHandlers(machineID)); keep existing early returns for
missing labels or wrong node but ensure tombstone-unwrapped NICs are handled the
same as direct NIC delete events.
- Around line 77-87: The watcher is started in Init() via go p.runWatcher(ctx)
which returns immediately and can allow informer cache sync to trigger
enqueueNICMachines() before MachineReconciler.Start() has registered handlers
(AddEventHandler), and handleNICDelete() currently asserts deleted objects to
*NetworkInterface without handling cache.DeletedFinalStateUnknown tombstones.
Fix by making Init() block until the watcher is fully started and its informer
caches have synced (or return an error on sync failure) before returning so
MachineReconciler.Start() can register handlers first; alternatively register
event handlers prior to starting the informer in runWatcher. Also harden
handleNICDelete() to accept cache.DeletedFinalStateUnknown by type-switching the
object, extracting the contained object and then asserting to *NetworkInterface
(or handling missing/invalid types) so deletion tombstones do not cause panics
and deletion recovery runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9bb520dd-2cc2-4acd-8c07-d641c4f8657e

📥 Commits

Reviewing files that changed from the base of the PR and between 45e55ce and b043eff.

📒 Files selected for processing (9)
  • cmd/libvirt-provider/app/app.go
  • internal/controllers/controllers_suite_test.go
  • internal/controllers/machine_controller.go
  • internal/controllers/machine_controller_nics.go
  • internal/networkinterfaceplugin/apinet.go
  • internal/plugins/networkinterface/apinet/apinet.go
  • internal/plugins/networkinterface/isolated/isolated.go
  • internal/plugins/networkinterface/plugins.go
  • internal/plugins/networkinterface/providernetwork/providernetwork.go

Comment thread internal/plugins/networkinterface/apinet/apinet.go
Comment thread internal/plugins/networkinterface/apinet/apinet.go
Signed-off-by: Lukas Frank <lukas.frank@sap.com>
@lukasfrank lukasfrank marked this pull request as draft April 27, 2026 15:06
@lukasfrank lukasfrank marked this pull request as draft April 27, 2026 15:06
Signed-off-by: Lukas Frank <lukas.frank@sap.com>
@lukasfrank
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/plugins/networkinterface/apinet/apinet.go (1)

79-89: ⚠️ Potential issue | 🟠 Major

Make watcher startup synchronous.

Init() still returns immediately after go p.runWatcher(ctx). That means watcher creation/cache-sync failures are only logged, and the bootstrap enqueueNICMachines() pass can still run before any handler is registered, dropping the initial requeues this recovery path depends on. Please make Init() wait for watcher readiness/failure, or guarantee handlers are registered before the watcher starts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/plugins/networkinterface/apinet/apinet.go` around lines 79 - 89, The
Init method returns immediately after launching runWatcher in a goroutine,
causing handler registration and cache sync races; change Init (in function Init
of type Plugin) to wait for watcher startup success or failure before returning
by either (a) registering handlers (the handlers that enqueueNICMachines) on
p.apinetClient prior to starting the watcher and then calling runWatcher
synchronously so Init blocks until readiness/failure, or (b) modify runWatcher
to signal readiness/failure via a channel or error return and have Init start it
and wait on that signal (or error) before returning; ensure you reference
p.runWatcher, p.apinetClient and the handler registration that enqueues NIC
machines so the initial enqueueNICMachines pass cannot run before the watcher is
ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/plugins/networkinterface/apinet/apinet.go`:
- Around line 98-104: RemoveEventHandler must not compare
providernetworkinterface.EventHandler interface values directly (comparing h ==
handler can panic for concrete types like EventHandlerFuncs that contain func
fields). Change the registration API so AddEventHandler (or wherever handlers
are registered) returns a removal token/unsubscribe function or a numeric ID,
store handlers in p.eventHandlers keyed by that token/ID (or wrap entries in a
struct with id and handler), and implement RemoveEventHandler to accept that
token/ID (or call the returned unsubscribe) and remove the matching entry from
p.eventHandlers instead of doing interface equality; update references to
RemoveEventHandler and the event registration path accordingly (look for
Plugin.RemoveEventHandler, p.eventHandlers and
providernetworkinterface.EventHandler/ EventHandlerFuncs).

---

Duplicate comments:
In `@internal/plugins/networkinterface/apinet/apinet.go`:
- Around line 79-89: The Init method returns immediately after launching
runWatcher in a goroutine, causing handler registration and cache sync races;
change Init (in function Init of type Plugin) to wait for watcher startup
success or failure before returning by either (a) registering handlers (the
handlers that enqueueNICMachines) on p.apinetClient prior to starting the
watcher and then calling runWatcher synchronously so Init blocks until
readiness/failure, or (b) modify runWatcher to signal readiness/failure via a
channel or error return and have Init start it and wait on that signal (or
error) before returning; ensure you reference p.runWatcher, p.apinetClient and
the handler registration that enqueues NIC machines so the initial
enqueueNICMachines pass cannot run before the watcher is ready.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0fc0083-ab36-4d66-855e-28cad99bd125

📥 Commits

Reviewing files that changed from the base of the PR and between b043eff and 619dfd6.

📒 Files selected for processing (2)
  • internal/controllers/machine_controller_nics.go
  • internal/plugins/networkinterface/apinet/apinet.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controllers/machine_controller_nics.go

Comment thread internal/plugins/networkinterface/apinet/apinet.go Outdated
Signed-off-by: Lukas Frank <lukas.frank@sap.com>
Signed-off-by: Lukas Frank <lukas.frank@sap.com>
@lukasfrank lukasfrank marked this pull request as ready for review April 28, 2026 08:19
@lukasfrank lukasfrank merged commit 1d142e5 into main Apr 28, 2026
9 checks passed
@github-project-automation github-project-automation Bot moved this to Done in Roadmap Apr 28, 2026
@lukasfrank lukasfrank deleted the feat/watch-apinet-nics branch April 28, 2026 14:12
@friegger friegger added enhancement New feature or request labels Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/compute enhancement New feature or request integration-tests to run integration tests size/L

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Apinet NetworkInterface plugin does not handle external NIC deletion or store loss

3 participants