Skip to content

Conversation

@GeoffMontee
Copy link
Contributor

This was entirely written by Claude. We might need to fix some stuff. It's more of an intellectual exercise than a production-ready feature.

@tarzanek
Copy link
Contributor

build fail, fix it first

// ScyllaDBConnection Implementation
// ============================================================================

ScyllaDBConnection::ScyllaDBConnection(const ScyllaDBConfig& config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh ?

Scylla is supported in spark, we don't need a cpp connection to it
not mentioning cpp driver for scylla is obsolete and it will be replaced by its cpp-over-rust variant

so this code is completely useless and unoptimal

* Licensed under Apache License 2.0
*/

#include "mariadb_scylla_migrator.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it hard to believe mariadb doesn't have a spark DF connector

seems it does : https://mariadb.com/ja/resources/blog/hands-on-mariadb-columnstore-spark-connector/
maybe that would be more usefull than cpp native call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tarzanek ,

MariaDB Columnstore is different from native MariaDB. That blog post is also from 2018, which is ancient history in terms of MariaDB Columnstore maturity. That was back when Columnstore was just rebranded InfiniDB.

I chose MariaDB Connector/C for this , because it is the only MariaDB connector that has an API for the binlog.

Perhaps the binlog streaming/applying functionality should be separate from the Spark functionality. That would allow us to use something like MariaDB Connector/J for the spark functionality, but still use MariaDB Connector/C for the binlog streaming/applying functionality. The binlog stuff probably has to occur on one node at a time anyway. It's not like it can be divided up and given to multiple workers, because commit ordering is very important.

I'd love to meet with you sometime and discuss the best way to implement all of this. Let me know if you're down.

Thanks!

Copy link
Contributor

@tarzanek tarzanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiling a native library for spark seems like an overkill
(executors can be heterogeneous , so what will be the same is JDK version, so ideally is to build against it and not rely on below OS and its libs)

@GeoffMontee GeoffMontee marked this pull request as draft December 16, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants