Description
Bug Description
We are encountering a service failure due to a memory leak.
This graph shows the gradual increase in heap usage for our service. The service eventually OOMs and then k8s restarts, which is why you see a new line coming up after one ends.
Upon investigation, we found that AWS SDK v2 is trying to register connections in a HashMap within IdleConnectionReaper
and deregister it later, but heap dump shows a huge retained space not getting picked by the GC.
Above is the screenshot of heapdump visualised in VisualVM. You can see 15k+ objects of PoolingHttpClientConnectionManager
, causing 512MB+ of retained space, which could have been garbage collected.
My suspicion is that the deregister
method is not getting called once the API call to AWS is complete.
We are using AWS SDK v2 to talk to S3 and EMR Serverless API.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
The deregisterConnectionManager
method should be called every time the connection manager the task is completed. This would ensure proper memory management by allowing garbage collection to free up memory, preventing memory leaks that ultimately lead to service failures.
Current Behavior
Currently, when the connectionManager
is registered, the deregisterConnectionManager
method is not called, preventing garbage collection from releasing unused memory. This results in a gradual memory buildup, eventually leading to memory failure. (Reference: [IdleConnectionReaper.java - Line 36](
Reproduction Steps
- Create a function that calls
registerConnectionManager
. - After the connection manager has completed its tasks, verify if the
deregisterConnectionManager
method is invoked.
Possible Solution
No response
Additional Information/Context
No response
AWS SDK for Java Version
awsV2SdkVers = '2.20.38'
JDK Version
ENV JAVA_VERSION="21.0.6+7-1~20.04.1"
Operating System and Version
Ubuntu 20.04