-
I would like to understand the offset management of the mm2 a bit better. Why are there more commits in the topic that maps the offsets between both clusters? Our mirrormaker2-cluster-offsets saves the offset for each topic-partition approximately once an hour. Wouldn't this cause a lot of re-consuming if our mm2 dies? Is that setting configurable? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
I'm not sure if Apache Kafka has any good docs explaining it. @mimaison might know if there is something like that. |
Beta Was this translation helpful? Give feedback.
-
Let me start by explaining the role of these 2 topics and the data they each contain. Let's start with mirrormaker2-cluster-offsets. This is the offsets topic used by the Kafka Connect runtime that runs the MirrorMaker connectors. The Kafka Connect runtime uses it to automatically store offsets from source connectors periodically so in case a source connector is stopped (or crashes) it can resume from its last saved position in the source system. It is created in the cluster that the Kafka Connect runtime is connected too, typically the target cluster. In MirrorMaker only MirrorSourceConnector uses that mechanism to restore its position when it restarts. If you enable exactly once semantics, this connector is able to restart exactly where it left off and not duplicate or skip records. The other topic mm2-offset-syncs.{target-cluster}.internal is specific to MirrorMaker and is used to translate offsets between the source and target cluster. By default this topic is created in the source cluster but you can opt to put it n the target cluster if you want using offset-syncs.topic.location. Data is written into this topic by MirrorSourceConnector and MirrorCheckpointConnector reads it to translate consumer groups offsets from the source cluster. The mapping between the source and target offsets is called an offset-sync. The frequency at which MirrorSourceConnector writes offset-syncs has changed several times (and is still being worked on) over the past few releases so the exact behavior you see may vary depending on the version you are currently running. By default a new offset-syncs is emitted at least every offset.max.lag records (default to 100) or whether the gap between the source and target offset changes. In some cases, like many small transactions in the source topics, or topics with high record rates, this can result in a lot of offset-syncs. This topic is using the compact cleanup policy so in most cases its size should stay bounded and not too large. If the offset-syncs topics grows too large, you can increase offset.max.lag (however note that this may reduce the accuracy of the offset translation) or make the compaction more aggressive using the min.cleanable.dirty.ratio topic configuration. |
Beta Was this translation helpful? Give feedback.
Let me start by explaining the role of these 2 topics and the data they each contain.
Let's start with mirrormaker2-cluster-offsets. This is the offsets topic used by the Kafka Connect runtime that runs the MirrorMaker connectors. The Kafka Connect runtime uses it to automatically store offsets from source connectors periodically so in case a source connector is stopped (or crashes) it can resume from its last saved position in the source system. It is created in the cluster that the Kafka Connect runtime is connected too, typically the target cluster. In MirrorMaker only MirrorSourceConnector uses that mechanism to restore its position when it restarts. If you enable exactly once semantics…