Description
Add support for using ArangoDB as a source system. Read from the ArangoDB write-ahead log (WAL) and create Kafka messages based on the record changes.
This has some complications. ArangoDB's WAL API is only supported for single-server instances.
When you use ArangoDB is cluster mode, you end up with multiple DB Servers that each maintains a write-ahead log. If you tail each log, you can end up with duplicates depending on how replication was set up. Additionally, there has to be some work to ensure that the records from each of the individual logs are written into Kafka in the order that they were written into ArangoDB. While we're doing this work, we may be able to do some de-duplication stuff as well based on timestamps. We'll have to see when we get there.
In all likelihood, we probably won't be able to provide an exactly-once delivery guarantee; consumers will have to expect duplicate messages. This should be acceptable as exactly-once is only a guarantee assuming that nothing ever goes wrong with the producer, in which case consumers can expect to receive messages at least once.
One important thing to note is that this feature is not intended to be used for datacenter-to-datacenter replication. ArangoDB has its own solution for that as part of its enterprise offering (which incidentally uses Kafka). Our goal here is not to make enterprise features free; it's to hook up ArangoDB to Kafka.