Flink CDC PostgreSQL连接器分区表Schema加载优化 #4079

tchivs · 2025-08-06T02:46:43Z

tchivs
Aug 6, 2025

Flink CDC PostgreSQL连接器分区表Schema加载优化

问题描述

在使用Flink CDC PostgreSQL连接器监控按月分区的表时，遇到以下性能问题：

当前痛点

初始化耗时过长：需要获取每个子表的schema，导致大量时间消耗，而且在创建Schema Event CreateTable事件时，每张单表会全量加载所有schema。
Checkpoint超时风险：由于schema加载时间过长，数据库连接可能超时，容易触发checkpoint timeout
资源消耗过大：
- EventSerializer中ListSerializer缓存量庞大
- 需要为每个子表单独发布SchemaChangeEvent事件

优化方案

核心思路

启用分区表schema加载优化机制，改变现有的全量加载策略：

具体优化点

跳过单个分区表schema加载
- 不再逐个获取每个分区子表的schema
- 只加载主表或任意一个代表性子表的schema
- 利用分区表schema一致性的特点
显著提升性能
- 减少schema加载的网络IO次数
- 降低初始化阶段的总耗时
- 有效避免checkpoint timeout问题
优化内存使用
- 减少EventSerializer中ListSerializer的缓存占用
- 降低内存压力，提升整体稳定性
简化事件处理
- 不需要为每个子表单独发布SchemaChangeEvent
- 减少事件处理的复杂度和开销

预期收益

性能提升：显著减少分区表初始化时间
稳定性改善：避免checkpoint超时问题
资源优化：降低内存和网络资源消耗
维护简化：减少schema变更事件的处理复杂度

这种优化方案的技术可行性如何？

这个PR也提到了相关问题，但后续没跟进：#2571

我的分支已经修好了，通过新增两个配置来实现：
https://github.com/tchivs/flink-cdc/tree/release-3.5-pg

如果提交PR，则需要再修改一些东西，因为改动较大

liuxuzxx · 2025-10-13T02:35:52Z

liuxuzxx
Oct 13, 2025

是一个好的建议，现在有个任务需要加载:9个按天分表的任务，大概有: 3294张表，加上其他杂七杂八的:100多个表，大概: 3400+个表，生产每次任务启动，大概是: 70张表/分钟，每次都要处理Schema Event CreateTable事件需要: 48-55分钟左右，很头疼

1 reply

tchivs Oct 13, 2025
Author

是的，而且刚启动job并cancel时，由于等待get schema超时，导致taskmanager直接挂掉。

app//org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:771)

app//org.apache.flink.runtime.taskmanager.Task$$Lambda$765/0x0000000840662440.run(Unknown Source)

app//org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970)

app//org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:939)

app//org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763)

app//org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)

[email protected]/java.lang.Thread.run(Unknown Source)

2025-10-11 06:47:00,805 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Task did not exit gracefully within 180 + seconds.

org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully within 180 + seconds.

at org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1833) [flink-dist-1.20.2.jar:1.20.2]

at java.base/java.lang.Thread.run(Unknown Source) [?:?]

2025-10-11 06:47:00,807 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Fatal error occurred while executing the TaskManager. Shutting it down...

org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully within 180 + seconds.

at org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1833) [flink-dist-1.20.2.jar:1.20.2]

at java.base/java.lang.Thread.run(Unknown Source) [?:?]

2025-10-11 06:47:00,808 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Stopping TaskExecutor pekko.tcp://[email protected]:42217/user/rpc/taskmanager_0.

2025-10-11 06:47:00,808 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close ResourceManager connection d9f6039a9a3eef2f8759516151e0e2ee.

2025-10-11 06:47:00,809 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close JobManager connection for job 13062490432cc9868d36f0404f1ba7f4.

2025-10-11 06:47:00,810 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Attempting to fail task externally Source: Flink CDC Event Source: postgres -> SchemaOperator -> PrePartition (1/1)#0 (38c5223dcd8e157d6b9e47bc522382c1_cbc357ccb763df2852fee8c4fc7d55f2_0_0).

这里的报错本质是 Flink 的 TaskCancelerWatchDog 在取消任务后等待了 180+ 秒还没等到 Task 退出，于是强杀了
TaskManager。常见原因是算子在取消期间没有“协同式”退出，被阻塞在外部 IO 或长时间调用里（例如 JDBC/复制流读
取），导致无法在默认超时内结束。

我已经基于release-3.5版本，在自己分支尝试修复了，但还没提交PR，因为改动较大。

可以拉取看看

scan.include-partitioned-tables.enabled optional false Boolean 启用 PostgreSQL 分区表的分区路由功能。
启用时：
(1) 子分区表的事件将被路由到它们的父表。
(2) 对于 PostgreSQL 11+：配合 PUBLICATION 中的 publish_via_partition_root=true 参数使用可获得更好的性能。
(3) 对于 PostgreSQL 10：此选项通过从子分区加载模式（其中包含主键）来启用分区路由。
(4) 使用 partition.tables 指定哪些父表应参与分区路由。
表列表注意事项： 确保您的表匹配模式捕获您想要的表（PostgreSQL 11+ 使用父表，PostgreSQL 10 使用子表）。 partition.tables optional (none) String 用于将子分区表事件路由到父表的分区表模式。支持正则表达式。
点号（.）用于分隔命名空间（数据库）、模式与表名；在正则表达式中需要使用反斜杠对点号进行转义。
示例：aia_test\.public\.orders_\d{6}
注意：此选项需与 scan.include-partitioned-tables.enabled 一起使用。

分区路由写法说明

当 scan.include-partitioned-tables.enabled 为 true 时，可通过如下多种写法指定分区路由。右侧“子表模式”中仅表名部分作为正则匹配，命名空间与模式按字面量精确匹配。若模式中包含命名空间（catalog），在匹配时会被忽略（允许写，匹配时不强制要求）。

冒号写法（显式 parent:child）
- 三段：namespace.schema.parent:namespace.schema.child_regex
- 两段：schema.parent:schema.child_regex
- 仅表名：parent:child_regex（路由时父表将继承子表的 schema）
无冒号写法（仅子表正则）
- 三段：namespace.schema.child_regex
- 两段：schema.child_regex
- 仅表名：child_regex

正则中的点号转义示例：

aia_test\.public\.orders_\d{6}   # 匹配 namespace 相同、schema=public、表名 orders_YYYYMM
public\.orders_\d{6}              # 匹配 schema=public、表名 orders_YYYYMM
orders_\d{6}                      # 匹配任意 schema、表名 orders_YYYYMM（路由时继承子表的 schema）

选择器合成规则（最终生效的捕获列表）：

最终 include 列表 = “子表正则集合” + “未被子表正则覆盖的剩余父表”。
通过冒号左侧显式指定的父表，或根据子表正则推导出的父表（例如将 orders_\d{6} 推导为 orders，会去掉末尾下划线 _），不会重复追加到 include 列表中。
若子表正则为“仅表名”（如 orders_\d{6}），则认为覆盖所有 schema 下的同名父表，这些父表将被排除。

示例

无冒号：

tables: aia_test.public.orders,aia_test.public.orders_extend,aia_test.public.vouchers,aia_test.public.static_table
partition.tables: aia_test.public.orders_\d{6},aia_test.public.orders_extend_\d{6},aia_test.public.vouchers_\d{6}

最终用于匹配的输出：

aia_test.public.orders_\d{6},aia_test.public.orders_extend_\d{6},aia_test.public.vouchers_\d{6},aia_test.public.static_table

冒号：

tables: aia_test.public.orders,aia_test.public.orders_extend,aia_test.public.vouchers,aia_test.public.static_table
partition.tables: aia_test.public.orders:aia_test.public.orders_\d{6},aia_test.public.orders_extend:aia_test.public.orders_extend_\d{6},aia_test.public.vouchers:aia_test.public.vouchers_\d{6}

最终用于匹配的输出：

aia_test.public.orders_\d{6},aia_test.public.orders_extend_\d{6},aia_test.public.vouchers_\d{6},aia_test.public.static_table

liuxuzxx · 2025-10-13T02:52:36Z

liuxuzxx
Oct 13, 2025

所以，我现在也是没有办法，只能设置checkpoint的超时时间为: 60分钟，然后拆分开任务，之前9个按天分表的，我拆分成两个任务，一个是: 6个按天分+普通表，另外一个flink cdc 3的任务就是: 3个按天分表，都设置checkpoint超时时间为: 60分钟，目前看到基本都能正常同步。

我观察到，其实checkpoint的失败导致taskManager从新来，基本上都是第一次checkpoint失败导致的，后面checkpoint的时常，我观察到大概就只有: 5s-10s之间，短的有2-3s的样子，很快速

0 replies

liuxuzxx · 2025-10-13T02:55:19Z

liuxuzxx
Oct 13, 2025

现在又遇到一个问题：

就是我是使用了flink cdc 3的k8s的配置yaml的形式同步mysql的数据。然后我配置了jobManager和taskManager的podTemplate的时区都和所在的Node一致，我也进入到容器内执行了 bash > date 命令，看到的也确实是东8区的时间点。但是我看到flink cdc 3输出的日志，以及在资源文件的 transform: 使用的NOW() 函数返回的还是0时区的时间，找了很多配置，测试了还是不行，真的奇怪

2 replies

tchivs Oct 13, 2025
Author

我的分支已经修好了，通过新增两个配置来实现：
https://github.com/tchivs/flink-cdc/tree/release-3.5-pg
在原始的版本上，初始化任务会消耗大量时间去加载schema，而且在创建Schema Event CreateTable事件时，每张单表会全量加载所有schema。我的版本已经修复了这些问题。使用新版本初始化1000表用时只需几十秒，极大提升了性能，同时自动将分区表路由到父表，不会为子表创建Schema Event CreateTable事件

tchivs Oct 13, 2025
Author

pipeline:
local-time-zone: Asia/Shanghai 这个也不行吗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flink CDC PostgreSQL连接器分区表Schema加载优化 #4079

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Flink CDC PostgreSQL连接器分区表Schema加载优化 #4079

Uh oh!

Uh oh!

tchivs Aug 6, 2025

Flink CDC PostgreSQL连接器分区表Schema加载优化

问题描述

当前痛点

优化方案

核心思路

具体优化点

预期收益

Replies: 3 comments · 3 replies

Uh oh!

liuxuzxx Oct 13, 2025

Uh oh!

Uh oh!

tchivs Oct 13, 2025 Author

分区路由写法说明

Uh oh!

liuxuzxx Oct 13, 2025

Uh oh!

liuxuzxx Oct 13, 2025

Uh oh!

tchivs Oct 13, 2025 Author

Uh oh!

tchivs Oct 13, 2025 Author

tchivs
Aug 6, 2025

Replies: 3 comments 3 replies

liuxuzxx
Oct 13, 2025

tchivs Oct 13, 2025
Author

liuxuzxx
Oct 13, 2025

liuxuzxx
Oct 13, 2025

tchivs Oct 13, 2025
Author

tchivs Oct 13, 2025
Author