[FLINK-36165][source-connector/mysql] Support capturing snapshot data with conditions by Mrart · Pull Request #4104 · apache/flink-cdc

Mrart · 2025-08-21T12:58:12Z

issue link: https://issues.apache.org/jira/browse/FLINK-36165

… with conditions [FLINK-36165][source-connector/mysql] add docs [FLINK-36165][source-connector/mysql] Implement snapshot filter for MySQL table source [FLINK-36165][source-connector/mysql] Escape dot [FLINK-36165 ] fixed supported escape like 'city != 'China:beijing''

fixed checkstyle fixed test fixed MySqlTableSourceFactoryTest test error.

SML0127 · 2025-09-07T10:42:01Z

+    <tr>
+      <td>scan.snapshot.filters</td>
+      <td>optional</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>String</td>
+      <td>When reading a table snapshot, the rows of captured tables will be filtered using the specified filter expression (AKA a SQL WHERE clause). <br>
+          By default, no filter is applied, meaning the entire table will be synchronized. <br>
+          A colon (:) separates table name and filter expression, while a semicolon (;) separate multiple filters, 
+          e.g. `db1.user_table_[0-9]+:id > 100;db[1-2].[app|web]_order_\\.*:id < 0;`.
+      </td>
+    </tr>


How about explicitly stating that the filter conditions are combined using AND and that it has nothing to do with the binlog step?

SML0127 · 2025-09-07T10:48:00Z

+        Map<Selectors, String> snapshotFilters = toSelector(filters);
+
+        String filter = null;
+        for (Selectors selector : snapshotFilters.keySet()) {
+            if (selector.isMatch(
+                    org.apache.flink.cdc.common.event.TableId.tableId(
+                            tableId.catalog(), tableId.table()))) {
+                filter = snapshotFilters.get(selector);
+                break;
+            }


Can we fail fast on unknown/nonexiststent columns in filters?

SML0127 · 2025-09-07T10:55:16Z

+    public static Long queryRowCnt(
+            JdbcConnection jdbc, TableId tableId, String columnName, @Nullable String filter)
+            throws SQLException {
+
+        if (filter == null) {
+            return queryApproximateRowCnt(jdbc, tableId);
+        }
+
+        final String cntQuery =
+                String.format(
+                        "SELECT COUNT(%s) FROM %s WHERE %s",
+                        quote(columnName), quote(tableId), filter);
+        return jdbc.queryAndMap(
+                cntQuery,
+                rs -> {
+                    if (!rs.next()) {
+                        // this should never happen
+                        throw new SQLException(
+                                String.format(
+                                        "No result returned after running query [%s]", cntQuery));
+                    }
+                    return rs.getLong(1);
+                });
+    }
+


Because the filter is applied during split planning, the distribution factor may fall outside the configured bounds(0.05 ~1,000), affecting snapshot performance. I’d appreciate your thoughts on this.

SML0127

Thanks for the PR. I've left a few minor comments.

yuxiqian · 2025-09-08T01:18:48Z

It seems the author of #3776 is actively working on their PR, and there's some duplicated work... It would be nice if we can discuss & implement this nice feature at one place.

github-actions Bot added docs Improvements or additions to documentation mysql-cdc-connector mysql-pipeline-connector labels Aug 21, 2025

xiayuxiao and others added 2 commits August 22, 2025 08:49

resolve conflict

075a7a5

Mrart force-pushed the FLINK-36165 branch 3 times, most recently from d0738af to c3005b6 Compare August 23, 2025 02:50

fixed test

fd17223

fixed checkstyle fixed test fixed MySqlTableSourceFactoryTest test error.

Mrart force-pushed the FLINK-36165 branch from daabe95 to fd17223 Compare August 23, 2025 04:23

SML0127 reviewed Sep 7, 2025

View reviewed changes

Mrart closed this Sep 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-36165][source-connector/mysql] Support capturing snapshot data with conditions#4104

[FLINK-36165][source-connector/mysql] Support capturing snapshot data with conditions#4104
Mrart wants to merge 3 commits into
apache:masterfrom
Mrart:FLINK-36165

Mrart commented Aug 21, 2025

Uh oh!

SML0127 Sep 7, 2025

Uh oh!

SML0127 Sep 7, 2025 •

edited

Loading

Uh oh!

SML0127 Sep 7, 2025

Uh oh!

SML0127 left a comment

Uh oh!

yuxiqian commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mrart commented Aug 21, 2025

Uh oh!

SML0127 Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

SML0127 Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SML0127 Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

SML0127 left a comment

Choose a reason for hiding this comment

Uh oh!

yuxiqian commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SML0127 Sep 7, 2025 •

edited

Loading