-
Notifications
You must be signed in to change notification settings - Fork 1.2k
add --metadata_column support for Kafka CDC connector #6353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
private static final long serialVersionUID = 1L; | ||
|
||
@Override | ||
public String read(JsonNode source) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could push this into CdcMetadataConverter, so that both read() methods default to UnsupportedOperationException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you could dedupe most of this by adding a 'fieldName' parameter & pulling all the methods out into a superclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Purpose
Add
--metadata_column
support to Paimon Kafka CDC connector, similar to the already existing options added for MySQL and Postgres: #2077Supported metadata columns are those on
org.apache.kafka.clients.consumer.ConsumerRecord
i.e.:NoTimestampType
,CreateTime
orLogAppendTime
The feature is backwards compatible. It's only active when
--metadata_column
is supplied resp.SynchronizationActionBase.withMetadataColumns
is used.For now, I've only implemented this for the
KafkaDebeziumAvroDeserializationSchema
andKafkaDebeziumJsonDeserializationSchema
.Tests
KafkaMetadataConverter.java
Will also add more integration tests for Kafka Table and Database sync actions for various input formats.
API and Format
No changes to public apis or storage format.
The changes here are contained to the flink cdc package but I did have to update
CdcSourceRecord
since it previously didn't provide a way to surface arbitrary metadata for a record.The metadata attribute on
CdcSourceRecord
is intentionally a generic Map so that it can potentially be used to add metadata support for other connectors like Pulsar or Mongo that are not yet implemented.Documentation
Added the new
--metadata_column
parameter to Kafka CDC docs.Dev notes
For running integration tests on MacOS with Rancher Desktop, i had to properly expose the docker socket to testcontainers e.g. system wide via
sudo ln -sf "$HOME/.rd/docker.sock" /var/run/docker.sock
.Todo/WIP