-
Notifications
You must be signed in to change notification settings - Fork 77
Description
I'm at the stage where I think the sink connector is absolutely the right choice for me, but the one downside I've encountered is the initial sync, with not dumping the data to kafka and having it picked up at leisure if the initial snapshot fails for any reason, essentially the whole process starts again.
I have ~ 500 mysql databases with ~75 tables in each to sync and I am breaking them down into batches of 50 with a connector for each so the need to get them all done as an initial sync is proving quite challenging.
The clickhouse_loader script looks like a winner here, however I also see that clickhouse supports direct import from S3 so was wondering if anyone had any insights as to whether one option is better than the other?
Ideally I'll be scripting the whole process to get the initial syncs done from a registry of databases stored in clickhouse and then automatically switch on the cdc once the sync is completed.
The scripts here look like they are probably tested reasonably well (although I don't see an obvious way to capture the bin log pos in a single transaction from the mysql dumper script, although I did only have a quick look).
In theory dumping the dbs to s3 should be reasonably straightforward so in absence of suggestions, I'm probably at the point of tossing a coin.