Open
Description
Expected behavior
In the newPartitionProducer() function, there should be a retry of grabCnx(). It will be similar to the reconnectToBroker's grabCnx() retry logic.
Java producer has this retry logic.
Actual behavior
At the producer creation call, after a successful topic lookup at grabCnx() in producer_partition.go, if there is a network issue before the COMMAND to create producer sent, the grabCnx() will exit without retry.
We had frequent failures upon the initial producer creation.
Steps to reproduce
It's tricky to reproduce. But we observe the problem more frequently on Azure pod's initialization stage. After implementing the grabCnx() retry in the newPartitionProducer(), the problem has gone away. (Will do a PR)
System configuration
Pulsar version: 2.10
Metadata
Metadata
Assignees
Labels
No labels
Activity