Skip to content

[FLINK-38844][pipeline-connector][postgres]Add metadata column support#4202

Merged
lvyanquan merged 2 commits into
apache:masterfrom
tchivs:pg_opts
Mar 12, 2026
Merged

[FLINK-38844][pipeline-connector][postgres]Add metadata column support#4202
lvyanquan merged 2 commits into
apache:masterfrom
tchivs:pg_opts

Conversation

@tchivs
Copy link
Copy Markdown
Contributor

@tchivs tchivs commented Dec 29, 2025

What is the purpose of the pull request

This PR adds metadata column support for the PostgreSQL Pipeline Connector, enabling users to access metadata information such as operation timestamp, database name, schema name, and table name in their data pipelines.

Brief change log

  • Add 4 metadata column implementations:
    • OpTsMetadataColumn: Operation timestamp metadata
    • DatabaseNameMetadataColumn: Database name metadata
    • SchemaNameMetadataColumn: Schema name metadata
    • TableNameMetadataColumn: Table name metadata
  • Update PostgresDataSource to support metadata columns via supportedMetadataColumns() method
  • Add comprehensive E2E test testAllMetadataColumns() in PostgresFullTypesITCase
  • Update documentation for both English and Chinese versions

Verifying this change

This change added tests and can be verified as follows:

  • Added testAllMetadataColumns() E2E test in PostgresFullTypesITCase
  • Test verifies metadata columns in both snapshot and incremental phases
  • All tests pass successfully

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths: no
  • Anything that affects deployment or recovery: no
  • Does this pull request introduce a new feature: yes

Documentation

  • Does this pull request introduce a new feature: yes
  • If yes, how is the feature documented: docs

@tchivs tchivs changed the title [FLINK-XXXXX][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025
@tchivs tchivs changed the title [FLINK-38844][pipeline-connector/postgres] Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector-postgres] Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025
@tchivs tchivs changed the title [FLINK-38844][pipeline-connector-postgres] Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector][postgres]Add metadata column support for PostgreSQL Pipeline Connector Dec 29, 2025
This commit adds metadata column support for the PostgreSQL Pipeline Connector,
enabling users to access metadata information in their data pipelines.

Changes:
- Add OpTsMetadataColumn for operation timestamp
- Add DatabaseNameMetadataColumn for database name
- Add SchemaNameMetadataColumn for schema name
- Add TableNameMetadataColumn for table name
- Update PostgresDataSource to support metadata columns
- Add comprehensive E2E test testAllMetadataColumns()
- Update documentation (English and Chinese)
@tchivs tchivs changed the title [FLINK-38844][pipeline-connector][postgres]Add metadata column support for PostgreSQL Pipeline Connector [FLINK-38844][pipeline-connector][postgres]Add metadata column support Dec 29, 2025
@github-actions github-actions Bot added docs Improvements or additions to documentation postgres-pipeline-connector labels Dec 29, 2025
Copy link
Copy Markdown
Member

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @tchivs' contribution.

I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).

@tchivs
Copy link
Copy Markdown
Contributor Author

tchivs commented Dec 29, 2025

Thanks for @tchivs' contribution.

I wonder if we need individual metadata columns for database, schema, and table, since they're always available in Transform expressions (only after FLINK-38840 got closed).

Thanks for the review @yuxiqian! You raise an important point about the overlap with Transform metadata fields.

You're right that namespace_name, schema_name, and table_name are already available in Transform expressions. Let me clarify the design rationale:

  1. op_ts is essential and non-redundant:
  • There's no equivalent in Transform metadata fields (MetadataColumns.java only defines namespace_name, schema_name, table_name, data_event_type)
  • op_ts can only be obtained via metadata.list from the source connector
  • This is consistent with MySQL connector's implementation
  1. For database_name, schema_name, table_name - there is overlap:

I see two perspectives here:

Argument for keeping them:

  • Sink persistence: Users can pass these to downstream sinks without writing transform rules
  • Consistency: MySQL connector has op_ts via metadata.list, so having all metadata follow the same pattern is intuitive
  • Simplicity: Direct configuration is simpler than transform expressions for basic use cases

Argument for removing them:

  • Redundancy: Transform already provides namespace_name, schema_name, table_name
  • Maintenance: Less code to maintain if we rely on Transform metadata

My suggestion:

  • Keep op_ts (essential, no alternative)
  • For database_name/schema_name/table_name, I'm open to either approach:
    • Option A: Keep them for consistency and ease of use
    • Option B: Remove them and document the Transform approach in the docs

What's your preference? I'm happy to adjust the PR based on the team's direction.

@yuxiqian
Copy link
Copy Markdown
Member

I think it's OK to polish documentations in this PR, leaving metadata definitions as it is.

@tchivs
Copy link
Copy Markdown
Contributor Author

tchivs commented Dec 29, 2025

I think it's OK to polish documentations in this PR, leaving metadata definitions as it is.

Thanks @yuxiqian for the feedback! I've polished the documentation to clarify the relationship between metadata columns and Transform expressions.

Changes made:

  • Added a note section explaining that database_name, schema_name, and table_name are also available via Transform expressions (__namespace_name__, __schema_name__, __table_name__)
  • Clarified that op_ts is only available via metadata.list
  • Explained the trade-offs: using metadata.list allows passing values directly to downstream sinks without transform rules (simpler for basic use cases)
  • Updated the table descriptions to mention the Transform expression alternatives

The metadata definitions remain unchanged as you suggested.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds metadata column support to the PostgreSQL Pipeline Connector so pipelines can expose source metadata (op timestamp, database/schema/table name) to transforms and downstream sinks.

Changes:

  • Introduces SupportedMetadataColumn implementations for op_ts, database_name, schema_name, and table_name.
  • Extends PostgresDataSource with supportedMetadataColumns() to advertise these columns to the pipeline runtime.
  • Adds/updates tests and documentation (EN + ZH) covering metadata configuration and behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/test/java/org/apache/flink/cdc/connectors/postgres/source/PostgresFullTypesITCase.java Adds E2E coverage verifying metadata presence/values in snapshot and incremental phases.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/test/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactoryTest.java Adds a unit test asserting the set of supported metadata columns exposed by the data source.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDataSource.java Advertises supported metadata columns to the pipeline runtime.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/OpTsMetadataColumn.java Implements op_ts metadata column reader/type definition.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/DatabaseNameMetadataColumn.java Implements database_name metadata column reader/type definition.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/SchemaNameMetadataColumn.java Implements schema_name metadata column reader/type definition.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/TableNameMetadataColumn.java Implements table_name metadata column reader/type definition.
docs/content/docs/connectors/pipeline-connectors/postgres.md Documents metadata.list updates and adds a “Supported Metadata Columns” section.
docs/content.zh/docs/connectors/pipeline-connectors/postgres.md Same as EN docs, localized.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for adding metadata for database_name, schema_name, and table_name.

Using transform will include these metadata as additional data columns. When writing downstream to message queues like Kafka in debezium-json/canal-json format, we want to distinguish between actual data and metadata. This provides a simple support approach (though this requires downstream Kafka to understand the meaning of the upstream metadata keys of different upstream system like MySQL/PostgreSQL/MongoDB, which can be resolved through enumeration).

@lvyanquan lvyanquan merged commit 3580bf4 into apache:master Mar 12, 2026
8 checks passed
Mrart pushed a commit to Mrart/flink-cdc that referenced this pull request Mar 26, 2026
ThorneANN pushed a commit to ThorneANN/flink-cdc that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved docs Improvements or additions to documentation postgres-pipeline-connector reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants