[processor/k8s_attributes] Fix memory leak in k8s_attributes map#48987
Open
giuliano-sider wants to merge 1 commit into
Open
[processor/k8s_attributes] Fix memory leak in k8s_attributes map#48987giuliano-sider wants to merge 1 commit into
giuliano-sider wants to merge 1 commit into
Conversation
…to missing IP or other dynamic attributes from Pod deletion event object.
6145482 to
c6b17ac
Compare
Contributor
Author
|
I seem to reproduce the issue (#48986) reliably when scaling down my 100K Deployment of Running Pods to zero. The number of entries in the map this time is 180K. I suppose if I keep scaling up and scaling down, the leak will cause this number to continue to climb? The unit test that I added (TestPodDeleteIPMissingFromDeleteEvent) fails without the fix in this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[processor/k8s_attributes] Fix memory leak in k8s_attributes map due to missing IP or other dynamic attributes from Pod deletion event object.
Component(s)
[processor/k8s_attributes]
Description
In environments with high pod churn or rapid scale-down, the
k8sattributesprocessor can leak pod IP-based cache entries (connection: <IP>andresource_attribute: k8s.pod.ip) in the internalc.Podsmap. This leads to unbounded memory growth and a stable baseline of leaked keys even after the actual pods have been scaled down to zero.Root Cause
When a pod is deleted, the
WatchClient'sforgetPodmethod is called to purge the cached entries:forgetPodbuildspodToRemovefrom the incoming delete event payloadpod.getIdentifiersFromAssoc(podToRemove)to determine which keys to add to the delete queue.DELETEevent is dispatched (or if the status payload is incomplete),pod.Status.PodIPwill be empty ("").getIdentifiersFromAssoconly generates the UID-based identifier (resource_attribute: k8s.pod.uid). The IP-based identifiers (connection: <IP>andresource_attribute: k8s.pod.ip) are skipped.c.Podsmap.Proposed Solution
Instead of relying on the incoming delete event status to determine the keys to delete,
forgetPodshould look up the cached pod fromc.Podsusing the pod's UID (which is always present in the event). The cached pod object is guaranteed to contain the IP address and all attributes as they were stored during the pod's lifecycle.Link to tracking issue
Fixes #48986
Testing
Added a unit test case that catches the issue of a Pod object from a deletion event that is missing IP fields.
Documentation
Added a changelog entry for a bug fix.
Authorship