Fix pod IP deletion leak and namespace filtering issues by aanchal22 · Pull Request #2116 · microsoft/retina

aanchal22 · 2026-03-16T14:33:27Z

Fix: Pod IP Deletion Leak and Namespace Filtering (#2085)

Pod IPs were leaking in the eBPF filtermap because metadata flags (pod/namespace) were re-evaluated at DELETE time instead of using values recorded at ADD time. This caused mismatches during IP reuse,
namespace filter changes, and annotation changes.

Additionally, namespace exclude filtering was non-functional:

appendExcludeList() was empty (not implemented)
updateNamespaceLists() used sequential if instead of if/else if
nsOfInterest() had incorrect default behavior (returned false instead of true)

Changes:

Add metadataTrackingInfo struct to track which metadata was used during ADD
Use tracked metadata during DELETE operations regardless of current state
Implement appendExcludeList() with proper initial setup via GetAllNamespaces()
Fix updateNamespaceLists() if/else logic and nsOfInterest() default
Add DELETE event protection and warning logs for deleteIP failures

Related Issue

Fixes #2085

Checklist

I have read the contributing documentation.
I signed and signed-off the commits (git commit -S -s ...).
I have correctly attributed the author(s) of the code.
I have tested the changes locally.
I have followed the project's style guidelines.
[] I have updated the documentation, if necessary.
I have added tests, if applicable.

Testing Completed

Built and deployed multi-arch images (amd64 + arm64) successfully
go build passes

Additional Notes

The metadata tracking overhead is ~24 bytes per tracked IP
No breaking changes — default behavior is preserved
Windows stubs updated to match new function signatures

Fixes a critical issue causing metrics collection failures Pod IPs were leaking in the eBPF filtermap due to metadata mismatch between ADD and DELETE operations. Metadata flags (pod/namespace) were re-evaluated at DELETE time instead of using values from ADD time, causing mismatches in: - IP reuse (tracked → untracked namespace) - Namespace filter changes after pod add - Annotation changes between add and delete **Solution:** Track which metadata was used during ADD and use the same metadata during DELETE, regardless of state changes. Namespace exclude filtering was broken, causing no metrics collection or eBPF map exhaustion Problems: - appendExcludeList() was empty (not implemented) - updateNamespaceLists() used sequential ifs instead of if/else - nsOfInterest() had incorrect default behavior - No protection against spurious DELETE events **Solution:** Implement namespace filtering. - Add metadataTrackingInfo struct to track metadata per IP - Record pod/namespace metadata after successful AddIPs - Use tracked metadata (not current flags) during DeleteIPs - Implement appendExcludeList() with proper initial setup - Fix updateNamespaceLists() if/else logic - Fix nsOfInterest() default to return true when no filtering - Add DELETE event protection (check cache before deleting) - Add GetAllNamespaces() to cache interface - Add warning logs for deleteIP failures - Eliminates memory leak (refcount reaches zero) - Fixes namespace exclude filtering - Handles IP reuse correctly - No breaking changes - Minimal overhead (~24 bytes per tracked IP) Signed off by: Aanchal Khandelwal (akhandelwal@adobe.com)

alexcastilio · 2026-03-16T16:49:48Z

This is already being addressed by #2114 and #2118

aanchal22 · 2026-03-16T17:13:13Z

This is already being addressed by #2114 and #2118

A few gaps I noticed from my investigation that the two PRs don't cover:

Spurious DELETE event protection
When a pod DELETE event fires, neither PR verifies the pod is actually gone from the cache before processing. Due to the cache timing issue (cache updated before event published), spurious DELETE events during
startup or rapid pod churn could remove valid IPs from the filtermap. Our branch added a cache check:

if endpoint := m.daemonCache.GetPodByIP(ip.String()); endpoint != nil {
     // Pod still exists in cache — ignore spurious DELETE
     return
 }

Forced Annotated = true on IP reuse (in handlePodEvent)
When a pod IP is reused by an untracked pod, the current code forces podCacheEntry.Annotated = true before adding to the delete cache. This causes the delete to use pod-annotation metadata even if the original
IP was added with namespace metadata, potentially leaving a stale entry. PR fix: Pod IP Deletion Leak in eBPF FilterMap #2114's brute-force "delete with both" approach may mask this, but the forced flag is still incorrect.
Filtermanager observability
No warning logs are emitted when deleteIP fails in the filtermanager cache (requestor not found, IP not found). This makes it harder to diagnose leak issues in production. Adding warnings to
pkg/managers/filtermanager/cache.go for these failure paths would improve debuggability.
eBPF filter map size configurability
The retina_filter eBPF map max_entries is hardcoded at 255. For clusters with many tracked pods, this can cause "no space left on device" errors. I have a separate PR#2117 for making this configurable via Helmvalues / env var.

aanchal22 requested a review from a team as a code owner March 16, 2026 14:33

aanchal22 requested review from jimassa and mainred March 16, 2026 14:33

aanchal22 mentioned this pull request Mar 16, 2026

fix: pod IP deletion leak, namespace filtering, and configurable filter map size #2112

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pod IP deletion leak and namespace filtering issues#2116

Fix pod IP deletion leak and namespace filtering issues#2116
aanchal22 wants to merge 1 commit intomicrosoft:mainfrom
aanchal22:2085/fix-pod-ip-leak-namespace-filtering

aanchal22 commented Mar 16, 2026

Uh oh!

alexcastilio commented Mar 16, 2026

Uh oh!

aanchal22 commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aanchal22 commented Mar 16, 2026

Fix: Pod IP Deletion Leak and Namespace Filtering (#2085)

Related Issue

Checklist

Testing Completed

Additional Notes

Uh oh!

alexcastilio commented Mar 16, 2026

Uh oh!

aanchal22 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aanchal22 commented Mar 16, 2026 •

edited

Loading