Skip to content

fix: pod IP deletion leak, namespace filtering, and configurable filter map size#2112

Closed
aanchal22 wants to merge 2 commits intomicrosoft:mainfrom
aanchal22:2085/exclude-ns-feature
Closed

fix: pod IP deletion leak, namespace filtering, and configurable filter map size#2112
aanchal22 wants to merge 2 commits intomicrosoft:mainfrom
aanchal22:2085/exclude-ns-feature

Conversation

@aanchal22
Copy link
Contributor

Description

This PR fixes critical bugs in namespace exclude filtering and pod IP tracking, and adds configurability for the eBPF filter map size.

Fix: Pod IP Deletion Leak and Namespace Filtering (#2085)

Pod IPs were leaking in the eBPF filtermap because metadata flags (pod/namespace) were re-evaluated at DELETE time instead of using values recorded at ADD time. This caused mismatches during IP reuse,
namespace filter changes, and annotation changes.

Additionally, namespace exclude filtering was non-functional:

  • appendExcludeList() was empty (not implemented)
  • updateNamespaceLists() used sequential if instead of if/else if
  • nsOfInterest() had incorrect default behavior (returned false instead of true)

Changes:

  • Add metadataTrackingInfo struct to track which metadata was used during ADD
  • Use tracked metadata during DELETE operations regardless of current state
  • Implement appendExcludeList() with proper initial setup via GetAllNamespaces()
  • Fix updateNamespaceLists() if/else logic and nsOfInterest() default
  • Add DELETE event protection and warning logs for deleteIP failures

Feature: Configurable eBPF Filter Map Size

The retina_filter eBPF map max_entries was hardcoded. This makes it configurable at runtime via Helm values or environment variable by overriding MapSpec.MaxEntries before loading the map into the
kernel.

Configuration options:

  • Helm values.yaml: filterMapMaxEntries: 15000
  • Environment variable: RETINA_FILTERMAXMAPENTRIES=15000
  • Default: 255

Changes:

  • Add FilterMapMaxEntries to Config struct with default of 255
  • Modify filter.Init() to accept and apply maxEntries parameter
  • Thread config value through bpf.Setup(), filtermanager.Init(), and callers
  • Add Helm chart values and ConfigMap template support

Related Issue

Fixes #2085

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...).
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • [] I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Testing Completed

  • Verified filter map initializes with configured maxEntries value via init container logs:
    level=info caller=filter/filter_map_linux.go:63 msg="Filter map max entries configured" maxEntries=15000
    
  • Verified configurable via Helm values.yaml and ConfigMap
  • Verified default of 255 applies when not configured
  • Built and deployed multi-arch images (amd64 + arm64) successfully
  • go build passes for both controller/main.go and init/retina/main_linux.go

Additional Notes

  • The metadata tracking overhead is ~24 bytes per tracked IP
  • No breaking changes — default behavior is preserved
  • Windows stubs updated to match new function signatures

Fixes a critical issue causing metrics collection failures

Pod IPs were leaking in the eBPF filtermap due to metadata mismatch between
ADD and DELETE operations. Metadata flags (pod/namespace) were re-evaluated
at DELETE time instead of using values from ADD time, causing mismatches in:
- IP reuse (tracked → untracked namespace)
- Namespace filter changes after pod add
- Annotation changes between add and delete

**Solution:** Track which metadata was used during ADD and use the same
metadata during DELETE, regardless of state changes.

Namespace exclude filtering was broken, causing no metrics collection or eBPF map exhaustion
Problems:
- appendExcludeList() was empty (not implemented)
- updateNamespaceLists() used sequential ifs instead of if/else
- nsOfInterest() had incorrect default behavior
- No protection against spurious DELETE events

**Solution:** Implement namespace filtering.

- Add metadataTrackingInfo struct to track metadata per IP
- Record pod/namespace metadata after successful AddIPs
- Use tracked metadata (not current flags) during DeleteIPs
- Implement appendExcludeList() with proper initial setup
- Fix updateNamespaceLists() if/else logic
- Fix nsOfInterest() default to return true when no filtering
- Add DELETE event protection (check cache before deleting)
- Add GetAllNamespaces() to cache interface
- Add warning logs for deleteIP failures
- Eliminates memory leak (refcount reaches zero)
- Fixes namespace exclude filtering
- Handles IP reuse correctly
- No breaking changes
- Minimal overhead (~24 bytes per tracked IP)

Signed off by: Aanchal Khandelwal (akhandelwal@adobe.com)
  The retina_filter eBPF map had its max_entries hardcoded in the C source.
  This makes it configurable at runtime by overriding MapSpec.MaxEntries
  before loading the map into the kernel.

  Configurable via:
  - Helm values.yaml: filterMapMaxEntries: 15000
  - Environment variable: RETINA_FILTERMAXMAPENTRIES=15000
  - Default: 255

  Files changed:
  - pkg/config/config.go: Added FilterMapMaxEntries field
  - pkg/plugin/filter/filter_map_linux.go: Init() accepts maxEntries param
  - pkg/plugin/filter/filter_map_windows.go: Updated signature to match
  - pkg/bpf/setup_linux.go: Thread filterMapMaxEntries to filter.Init()
  - init/retina/main_linux.go: Pass config value to bpf.Setup()
  - pkg/managers/filtermanager/manager_linux.go: Thread filterMapMaxEntries
  - pkg/managers/filtermanager/manager_windows.go: Updated signature to match
  - pkg/watchers/apiserver/apiserver.go: Updated filtermanager.Init() call
  - cmd/standard/daemon.go: Pass config value to filtermanager.Init()
  - deploy/standard/.../retina/values.yaml: Added default
  - deploy/standard/.../retina/templates/configmap.yaml: Added template line
  - adobe/ethos-core-retina/chart/values.yaml: Adobe override (15000)
  - Test files: Updated call signatures

Signed off by: Aanchal Khandelwal (akhandelwal@adobe.com)
@aanchal22 aanchal22 requested a review from a team as a code owner March 13, 2026 21:07
@aanchal22
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Adobe"

Copy link
Member

@nddq nddq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanchal22 This PR is trying address too many things at once. We should break this PR down into at least 2 smaller ones according to the description.

@aanchal22
Copy link
Contributor Author

aanchal22 commented Mar 16, 2026

@aanchal22 This PR is trying address too many things at once. We should break this PR down into at least 2 smaller ones according to the description.

Going to close this PR as I have created 2 separate PRs as suggested - #2117 & #2116

@aanchal22 aanchal22 closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pod IP Deletion Leak in eBPF Filter and Namespace Filtering Issues in MetricConfiguration CRD

2 participants