Description
Search before asking
- I searched in the issues and found nothing similar.
Read release policy
- I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.
Version
master branch code analysis
Minimal reproduce step
There's currently an issue that the org.apache.pulsar.broker.service.ServerCnx#completedSendOperation might not get called in error cases.
The impact of this is that message publishing could stop for all connections using a particular IO thread.
The broker maxMessagePublishBufferSizeInMB
limit is split into a maxPendingBytesPerThread
limit:
The pending bytes is incremented in sending:
It is decremented in ServerCnx#completedSendOperation method:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Lines 3376 to 3377 in 3fce309
If the call to decrement is missing, there will be a leak which will eventually cause all message publishing to stop for all connections using a particular IO thread.
The leak happens here:
There should be a call to MessagePublishContext#completed for all exception cases. ServerCnx#completedSendOperation gets called for exception path in MessagePublishContext#completed here:
The other exception cases contain the required call to callback.completed
which will call ServerCnx#completedSendOperation:
What did you expect to see?
There shouldn't be a leak in maxPendingBytesPerThread
permits which eventually leads to message publishing stopping for all connections using a particular IO thread.
What did you see instead?
Based on the analysis of the code, there's a leak.
Anything else?
This might be related to issue #23920
A heap dump could be used to check if the issue applies. This can be done by searching org.apache.pulsar.broker.service.ServerCnx$PendingBytesPerThreadTracker
instances in the heap dump and checking the pendingBytes
and limitExceeded
field values.
Are you willing to submit a PR?
- I'm willing to submit a PR!