producing message with error not the leader for partition #3050

babak-relex · 2024-12-30T17:03:30Z

babak-relex
Dec 30, 2024

I'm facing an issue with my application that uses the Sarama library to send messages to a Kafka cluster.

Locally: My application runs smoothly, and messages are successfully sent to the cluster.

In Cluster: I encounter the following error:

error sending message: kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date

my client config:

config := sarama.NewConfig()
config.Net.SASL.Enable = true
config.Net.SASL.User = user
config.Net.SASL.Password = password
config.Net.SASL.Mechanism = sarama.SASLTypePlaintext
config.Net.TLS.Enable = true
config.Producer.Return.Successes = true
config.Metadata.RefreshFrequency = 10 * time.Minute

// Enable debug logging
sarama.Logger = log.New(os.Stdout, "[Sarama] ", log.LstdFlags|log.Lshortfile)
config.Producer.Retry.Max = 5
config.Producer.Retry.Backoff = 3 * time.Second // just for test

sending message:

// create the producer 
producer, err := sarama.NewSyncProducer(brokers, config)


// use the producer, whenever I need to send message
producer.SendMessage(&sarama.ProducerMessage{
		Topic: topic,
		Key:   sarama.StringEncoder(key),
		Value: sarama.StringEncoder(value),
	})

these are the logs from Sarama:

---> refresh logs
[Sarama] sarama.go:129: client/metadata fetching metadata for [testtopic] from broker broker2:9092
[Sarama] sarama.go:129: Connected to broker at broker2:9092 (registered as #2)


---> retries when send the log 
[Sarama] async_producer.go:624: producer/leader/testtopic/0 selected broker 0
[Sarama] async_producer.go:727: producer/leader/testtopic/0 state change to [flushing-2]
[Sarama] async_producer.go:749: producer/leader/testtopic/0 state change to [normal]
[Sarama] async_producer.go:914: producer/broker/0 starting up
[Sarama] async_producer.go:930: producer/broker/0 state change to [open] on testtopic/0
[Sarama] async_producer.go:1138: producer/broker/0 state change to [retrying] on testtopic/0 because kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date
[Sarama] async_producer.go:721: producer/leader/testtopic/0 abandoning broker 0
[Sarama] async_producer.go:711: producer/leader/testtopic/0 state change to [retrying-3]
[Sarama] async_producer.go:946: producer/broker/0 state change to [closed] on testtopic/0
[Sarama] async_producer.go:920: producer/broker/0 input chan closed
[Sarama] async_producer.go:1020: producer/broker/0 shut down


---> the same log for another 4 times (5 retries)
[Sarama] async_producer.go:624: producer/leader/testtopic/0 selected broker 0
...
[Sarama] async_producer.go:1020: producer/broker/0 shut down

---> at last:

error sending message: kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date

I'm unsure whether this issue originates from my application code or some network issues in the cluster(communication between kafka and application). This error is a bit ambiguous, because client found the leader, and it found it correctly (broker 0, partition 0).

Could you please provide some suggestions on how to troubleshoot this error and pinpoint the root cause?"

Answered by babak-relex

Jan 28, 2025

We finally tracked down the problem. It was a DNS configuration issue in our private Kubernetes cluster. The DNS resolver wasn't handling broker redirects for partition leadership properly, causing the "Your metadata is out of date" errors. So, if you see such an error, double-check your network setup or your client configuration related to network.

View full answer

dnwe · 2024-12-30T17:50:04Z

dnwe
Dec 30, 2024
Maintainer

When a Kafka broker restarts, it resigns from being the current leader for the partitions it is preferred leader for. Kafka automatically detects this and triggers a leader election to choose a new leader from the remaining in-sync replicas. When the original broker comes back Kafka will fairly quickly hand back leadership to it in order to keep the preferred leaders. This can show as a transient error at the producer if it send a message during this switchover - it will be told it got the wrong broker now and it will refresh metadata and retry. With retries enabled as you have in your config, this shouldn’t even be seen by your application

In your example snippet you show creating a producer and then using it. You aren’t creating a new producer every time you want to send a message are you? It is better to re-use a producer for your requests as it will maintain a connection to Kafka and be aware of state

0 replies

babak-relex · 2024-12-30T18:04:05Z

babak-relex
Dec 30, 2024
Author

here in the logs I just showed one of retries, of course it has 5 retries and it does the same thing. And also I'm sure that no leader change or any other changes in the cluster.
No, I create the producer once and then use it whenever I need to send the message, I just added it here to show I create the syncProducer ( update the question to fix this ambiguity)

0 replies

babak-relex · 2025-01-28T09:18:02Z

babak-relex
Jan 28, 2025
Author

We finally tracked down the problem. It was a DNS configuration issue in our private Kubernetes cluster. The DNS resolver wasn't handling broker redirects for partition leadership properly, causing the "Your metadata is out of date" errors. So, if you see such an error, double-check your network setup or your client configuration related to network.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

producing message with error not the leader for partition #3050

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

producing message with error not the leader for partition #3050

Uh oh!

Uh oh!

babak-relex Dec 30, 2024

Replies: 3 comments

Uh oh!

dnwe Dec 30, 2024 Maintainer

Uh oh!

Uh oh!

babak-relex Dec 30, 2024 Author

Uh oh!

babak-relex Jan 28, 2025 Author

babak-relex
Dec 30, 2024

dnwe
Dec 30, 2024
Maintainer

babak-relex
Dec 30, 2024
Author

babak-relex
Jan 28, 2025
Author