Skip to content

Conversation

@bachmanity1
Copy link
Contributor

image

I've noticed that when working with clusters having several controller and broker nodes, retrieving cluster metadata may occasionally return stale data. This issue is evident when, for example, an end-to-end test topic is created, and the topic metadata is retrieved immediately afterward. In this case, the metadata request may return error code 3, which corresponds to the UNKNOWN_TOPIC_OR_PARTITION error. This error likely occurs when a client connects to a broker that is not yet aware of the new topic because it hasn't consumed the latest data from the __cluster_metadata topic.

In summary, when the end-to-end code attempts to populate the value of s.partitionCount, it might encounter the UNKNOWN_TOPIC_OR_PARTITION error, even if the topic actually exists. This results in a partition count of zero, preventing the producer from sending any data.

if err != nil {
return fmt.Errorf("could not get topic metadata after validation: %w", err)
}
partitions := len(topicMetadata.Topics[0].Partitions)
Copy link
Contributor Author

@bachmanity1 bachmanity1 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if the err above is nil the topicMetadata.Topics[0].ErrorCode is not guaranteed to be zero.

@bachmanity1
Copy link
Contributor Author

Hi @weeco. Can you please have a look?

@weeco
Copy link
Contributor

weeco commented Aug 8, 2024

@bachmanity1 I'll try to look at your PR soon, but I'm currently very busy with other things that have higher priority. Even though this change is just a few lines, I'll have to dig deeper to see if this could cause other sort of issues before approving & merging the PR.

@bachmanity1
Copy link
Contributor Author

bachmanity1 commented Sep 3, 2024

Hi @weeco,

I wanted to let you know that I've updated this PR. I've realized that my previous approach didn't account for scenarios where the broker count is updated.

I also wanted to discuss the partitionsPerBroker config. I believe it should be clarified that this configuration is only applied when a topic is created and does not take effect when the broker count changes. If the original intent was for this setting to apply even when the broker count changes, we’d need to adjust the topic reconciliation logic accordingly. However, I think that handling cases where the broker count decreases could be quite tricky.

By the way, it seems there’s a bug preventing new partitions from being created when new brokers are added, as reported in #257. Could you take a look at that PR too? Thanks!

@bachmanity1
Copy link
Contributor Author

Hi @weeco, is there any chance to get this reviewed anytime soon?

@weeco
Copy link
Contributor

weeco commented Nov 3, 2024

Thanks for your PR @bachmanity1 and sorry for the long waiting time. I added some minor things but generally LGTM

@weeco weeco self-requested a review November 3, 2024 15:37
@weeco weeco merged commit 7bed85e into redpanda-data:master Nov 3, 2024
2 checks passed
@bachmanity1 bachmanity1 deleted the fix-e2e-producer branch November 3, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants