Skip to content

[FLINK-38889][pipeline][kafka] Support serializing complex types(MAP, ARRAY, ROW) to JSON (Debezium / Canal)#4221

Merged
lvyanquan merged 5 commits into
apache:masterfrom
linguoxuan:FLINK-38889-kafka
Jan 28, 2026
Merged

[FLINK-38889][pipeline][kafka] Support serializing complex types(MAP, ARRAY, ROW) to JSON (Debezium / Canal)#4221
lvyanquan merged 5 commits into
apache:masterfrom
linguoxuan:FLINK-38889-kafka

Conversation

@linguoxuan
Copy link
Copy Markdown
Contributor

@linguoxuan linguoxuan commented Jan 12, 2026

This closes FLINK-38889.

Purpose

This PR fixes the issue where YAML Kafka sink connector does not support serializing complex types (MAP, ARRAY, ROW) to JSON format (Debezium / Canal), while Kafka SQL connector handles them without problem.

Root Cause

The issue was in the TableSchemaInfo class, which is responsible for converting CDC's RecordData format to Flink's RowData format before JSON serialization. The createFieldGetter() method lacked the necessary conversion logic for complex types.

Changes

  1. Added complex type conversion methods in TableSchemaInfo.java: support for ARRAY, MAP, and ROW types

Testing

  1. TableSchemaInfoTest.java:
  2. DebeziumJsonSerializationSchemaTest.java:
  3. CanalJsonSerializationSchemaTest.java:
  4. KafkaDataSinkITCase.java

@yuxiqian yuxiqian assigned yuxiqian and unassigned yuxiqian Jan 12, 2026
@yuxiqian yuxiqian self-requested a review January 12, 2026 15:04
Copy link
Copy Markdown
Member

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Skyler for the quick fix! Just left some trivial comments.

@linguoxuan
Copy link
Copy Markdown
Contributor Author

Thanks @yuxiqian for the review! I have made the changes as suggested. Since the suggestions focused on code formatting, I force-pushed the code to make it clearer. PTAL.

@yuxiqian
Copy link
Copy Markdown
Member

yuxiqian commented Jan 13, 2026

Thanks for the quick response! Just pushed another commit to simplify IT case and docs style.

Would @lvyanquan like to take another look?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for serializing complex types (MAP, ARRAY, ROW) to JSON format in the Kafka sink connector for both Debezium and Canal JSON formats. Previously, only the Kafka SQL connector supported these complex types, while the YAML-configured Kafka sink connector would fail when encountering them.

Changes:

  • Refactored type conversion logic from TableSchemaInfo into a new RecordDataConverter utility class
  • Added conversion support for ARRAY, MAP, and ROW types with recursive handling for nested structures
  • Added comprehensive test coverage including unit tests and integration tests for various complex type scenarios

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
RecordDataConverter.java New utility class that handles conversion of CDC RecordData to Flink SQL RowData, including support for complex types (ARRAY, MAP, ROW) with recursive nesting
TableSchemaInfo.java Refactored to delegate field getter creation to RecordDataConverter, removing duplicate conversion logic
TableSchemaInfoTest.java Added test for nested ROW types within ARRAY to verify complex type conversion
DebeziumJsonSerializationSchemaTest.java Added test to verify Debezium JSON serialization of complex types
CanalJsonSerializationSchemaTest.java Added test to verify Canal JSON serialization of complex types
KafkaDataSinkITCase.java Added comprehensive integration tests covering basic complex types, nested arrays, maps with array values, null/empty collections, and deeply nested structures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

It would be even better if you could update the documentation to reflect this change.

@lvyanquan lvyanquan requested a review from yuxiqian January 27, 2026 11:24
Copy link
Copy Markdown
Member

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would @linguoxuan like to append docs in this PR, or open an individual PR for it?

@lvyanquan lvyanquan merged commit 0cfe1d3 into apache:master Jan 28, 2026
27 of 29 checks passed
@lvyanquan
Copy link
Copy Markdown
Contributor

Merged. Feel free to open a follow-up PR to update the documentation.

@linguoxuan
Copy link
Copy Markdown
Contributor Author

LGTM. Would @linguoxuan like to append docs in this PR, or open an individual PR for it?

I will do it.

@linguoxuan linguoxuan deleted the FLINK-38889-kafka branch February 1, 2026 04:23
Mrart pushed a commit to Mrart/flink-cdc that referenced this pull request Mar 26, 2026
… ARRAY, ROW) to JSON (Debezium / Canal) (apache#4221)

Co-authored-by: guoxuanlin <guoxuanlin@tencent.com>
Co-authored-by: yuxiqian <34335406+yuxiqian@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants