KAFKA-10721: Rewrite topology to allow for overlapping unequal topic subscriptions #20990
+335
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
This PR addresses the limitation where
StreamsBuilderthrows aTopologyExceptionwhen multiple source nodes subscribe to overlapping but not identical sets of topics.For example, the following code previously failed:
Error: Two source nodes are subscribed to overlapping but not equal input topicsChanges
To support this, I have implemented a mechanism to flatten multi-topic source nodes into individual single-topic source nodes before the topology optimization phase.
InternalStreamsBuilder#flattenSourceNodesAndRearrange(...)StreamSourceNodessubscribed to multiple topics.buildPriorityand child nodes (downstream processors) for each split node to ensure consistent topology construction.InternalStreamsBuilder#mergeDuplicateSourceNodes(...)TopologyExceptionfor overlapping but unequal topics.Note
I found the bug or non-determistic behavior.
More details are available in here (https://issues.apache.org/jira/browse/KAFKA-19923)
That issue has existed independently of this PR and has been present for quite some time.
In addition, once the topology actually starts running, it results in a ClassCastException, which immediately terminates the Kafka Streams application. Because of this fail-fast behavior, the bug is unlikely to affect any real-world, correctly configured Kafka Streams deployments.
While this PR does relax some of the previous constraints, I believe it remains highly unlikely for users to subscribe to the same topic with different ConsumedInternal configurations. Therefore, I do not expect this change to introduce any practical risk.
That said, it may still be helpful to document that all source nodes reading from the same topic should use semantically identical ConsumedInternal configurations.
Here, “identical” means that they should not differ in their TimestampExtractor or their key and value Serdes.
I hope this clarification is helpful.
Result