[FLINK-38844][pipeline-connector][postgres]Add metadata column support by tchivs · Pull Request #4202 · apache/flink-cdc

tchivs · 2025-12-29T09:09:55Z

What is the purpose of the pull request

This PR adds metadata column support for the PostgreSQL Pipeline Connector, enabling users to access metadata information such as operation timestamp, database name, schema name, and table name in their data pipelines.

Brief change log

Add 4 metadata column implementations:
- OpTsMetadataColumn: Operation timestamp metadata
- DatabaseNameMetadataColumn: Database name metadata
- SchemaNameMetadataColumn: Schema name metadata
- TableNameMetadataColumn: Table name metadata
Update PostgresDataSource to support metadata columns via supportedMetadataColumns() method
Add comprehensive E2E test testAllMetadataColumns() in PostgresFullTypesITCase
Update documentation for both English and Chinese versions

Verifying this change

This change added tests and can be verified as follows:

Added testAllMetadataColumns() E2E test in PostgresFullTypesITCase
Test verifies metadata columns in both snapshot and incremental phases
All tests pass successfully

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths: no
Anything that affects deployment or recovery: no
Does this pull request introduce a new feature: yes

Documentation

Does this pull request introduce a new feature: yes
If yes, how is the feature documented: docs

This commit adds metadata column support for the PostgreSQL Pipeline Connector, enabling users to access metadata information in their data pipelines. Changes: - Add OpTsMetadataColumn for operation timestamp - Add DatabaseNameMetadataColumn for database name - Add SchemaNameMetadataColumn for schema name - Add TableNameMetadataColumn for table name - Update PostgresDataSource to support metadata columns - Add comprehensive E2E test testAllMetadataColumns() - Update documentation (English and Chinese)

yuxiqian

Thanks for @tchivs' contribution.

I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).

tchivs · 2025-12-29T11:23:27Z

Thanks for @tchivs' contribution.

I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).

Thanks for the review @yuxiqian! You raise an important point about the overlap with Transform metadata fields.

You're right that namespace_name, schema_name, and table_name are already available in Transform expressions. Let me clarify the design rationale:

op_ts is essential and non-redundant:

There's no equivalent in Transform metadata fields (MetadataColumns.java only defines namespace_name, schema_name, table_name, data_event_type)
op_ts can only be obtained via metadata.list from the source connector
This is consistent with MySQL connector's implementation

For database_name, schema_name, table_name - there is overlap:

I see two perspectives here:

Argument for keeping them:

Sink persistence: Users can pass these to downstream sinks without writing transform rules
Consistency: MySQL connector has op_ts via metadata.list, so having all metadata follow the same pattern is intuitive
Simplicity: Direct configuration is simpler than transform expressions for basic use cases

Argument for removing them:

Redundancy: Transform already provides namespace_name, schema_name, table_name
Maintenance: Less code to maintain if we rely on Transform metadata

My suggestion:

Keep op_ts (essential, no alternative)
For database_name/schema_name/table_name, I'm open to either approach:
- Option A: Keep them for consistency and ease of use
- Option B: Remove them and document the Transform approach in the docs

What's your preference? I'm happy to adjust the PR based on the team's direction.

yuxiqian · 2025-12-29T11:36:09Z

I think it's OK to polish documentations in this PR, leaving metadata definitions as it is.

…relationship with Transform expressions

tchivs · 2025-12-29T14:50:37Z

I think it's OK to polish documentations in this PR, leaving metadata definitions as it is.

Thanks @yuxiqian for the feedback! I've polished the documentation to clarify the relationship between metadata columns and Transform expressions.

Changes made:

Added a note section explaining that database_name, schema_name, and table_name are also available via Transform expressions (__namespace_name__, __schema_name__, __table_name__)
Clarified that op_ts is only available via metadata.list
Explained the trade-offs: using metadata.list allows passing values directly to downstream sinks without transform rules (simpler for basic use cases)
Updated the table descriptions to mention the Transform expression alternatives

The metadata definitions remain unchanged as you suggested.

Copilot

Pull request overview

Adds metadata column support to the PostgreSQL Pipeline Connector so pipelines can expose source metadata (op timestamp, database/schema/table name) to transforms and downstream sinks.

Changes:

Introduces SupportedMetadataColumn implementations for op_ts, database_name, schema_name, and table_name.
Extends PostgresDataSource with supportedMetadataColumns() to advertise these columns to the pipeline runtime.
Adds/updates tests and documentation (EN + ZH) covering metadata configuration and behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/test/java/org/apache/flink/cdc/connectors/postgres/source/PostgresFullTypesITCase.java	Adds E2E coverage verifying metadata presence/values in snapshot and incremental phases.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/test/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactoryTest.java	Adds a unit test asserting the set of supported metadata columns exposed by the data source.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDataSource.java	Advertises supported metadata columns to the pipeline runtime.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/OpTsMetadataColumn.java	Implements `op_ts` metadata column reader/type definition.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/DatabaseNameMetadataColumn.java	Implements `database_name` metadata column reader/type definition.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/SchemaNameMetadataColumn.java	Implements `schema_name` metadata column reader/type definition.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/TableNameMetadataColumn.java	Implements `table_name` metadata column reader/type definition.
docs/content/docs/connectors/pipeline-connectors/postgres.md	Documents `metadata.list` updates and adds a “Supported Metadata Columns” section.
docs/content.zh/docs/connectors/pipeline-connectors/postgres.md	Same as EN docs, localized.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lvyanquan

+1 for adding metadata for database_name, schema_name, and table_name.

Using transform will include these metadata as additional data columns. When writing downstream to message queues like Kafka in debezium-json/canal-json format, we want to distinguish between actual data and metadata. This provides a simple support approach (though this requires downstream Kafka to understand the meaning of the upstream metadata keys of different upstream system like MySQL/PostgreSQL/MongoDB, which can be resolved through enumeration).

apache#4202)

tchivs force-pushed the pg_opts branch from c11cd00 to 1b774df Compare December 29, 2025 09:13

tchivs changed the title ~~[FLINK-XXXXX][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector~~ [FLINK-38844][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025

tchivs force-pushed the pg_opts branch from 1b774df to 13faad7 Compare December 29, 2025 09:21

tchivs changed the title ~~[FLINK-38844][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector~~ [FLINK-38844][pipeline-connector-postgres] Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025

tchivs changed the title ~~[FLINK-38844][pipeline-connector-postgres] Add metadata column support for PostgreSQL Pipeline Connector~~ [FLINK-38844][pipeline-connector][postgres]Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025

tchivs force-pushed the pg_opts branch from 13faad7 to 22a5551 Compare December 29, 2025 09:23

tchivs changed the title ~~[FLINK-38844][pipeline-connector][postgres]Add metadata column support for PostgreSQL Pipeline Connector~~ [FLINK-38844][pipeline-connector][postgres]Add metadata column support Dec 29, 2025

github-actions Bot added docs Improvements or additions to documentation postgres-pipeline-connector labels Dec 29, 2025

yuxiqian reviewed Dec 29, 2025

View reviewed changes

[FLINK-38844][docs] Polish metadata columns documentation to clarify …

3b3ba08

…relationship with Transform expressions

lvyanquan requested a review from Copilot March 3, 2026 03:52

Copilot started reviewing on behalf of lvyanquan March 3, 2026 03:52 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Comment thread ...est/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactoryTest.java

lvyanquan approved these changes Mar 11, 2026

View reviewed changes

github-actions Bot added approved reviewed labels Mar 11, 2026

lvyanquan merged commit 3580bf4 into apache:master Mar 12, 2026
8 checks passed

Mrart pushed a commit to Mrart/flink-cdc that referenced this pull request Mar 26, 2026

[FLINK-38844][pipeline-connector][postgres]Add metadata column support (

29a7be6

apache#4202)

ThorneANN pushed a commit to ThorneANN/flink-cdc that referenced this pull request Mar 31, 2026

[FLINK-38844][pipeline-connector][postgres]Add metadata column support (

3b736e3

apache#4202)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-38844][pipeline-connector][postgres]Add metadata column support#4202

[FLINK-38844][pipeline-connector][postgres]Add metadata column support#4202
lvyanquan merged 2 commits into
apache:masterfrom
tchivs:pg_opts

tchivs commented Dec 29, 2025

Uh oh!

yuxiqian left a comment

Uh oh!

tchivs commented Dec 29, 2025

Uh oh!

yuxiqian commented Dec 29, 2025

Uh oh!

tchivs commented Dec 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lvyanquan left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tchivs commented Dec 29, 2025

What is the purpose of the pull request

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

yuxiqian left a comment

Choose a reason for hiding this comment

Uh oh!

tchivs commented Dec 29, 2025

Uh oh!

yuxiqian commented Dec 29, 2025

Uh oh!

tchivs commented Dec 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

lvyanquan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lvyanquan left a comment •

edited

Loading