Skip to content

[FLINK-38244][hotfix] Fix the full snapshot phase, the field case is adjusted based on isTableIdCaseInsensitive#4284

Merged
lvyanquan merged 7 commits into
apache:masterfrom
sd4324530:hotfix-flink-38244
Mar 11, 2026
Merged

[FLINK-38244][hotfix] Fix the full snapshot phase, the field case is adjusted based on isTableIdCaseInsensitive#4284
lvyanquan merged 7 commits into
apache:masterfrom
sd4324530:hotfix-flink-38244

Conversation

@sd4324530
Copy link
Copy Markdown
Contributor

During the full snapshot phase, the field case is adjusted based on isTableIdCaseInsensitive.

@sd4324530
Copy link
Copy Markdown
Contributor Author

@leonardBang @loserwang1024 Please take a look when you have time.

@ThorneANN
Copy link
Copy Markdown
Contributor

looks good but no test

@sd4324530
Copy link
Copy Markdown
Contributor Author

looks good but no test

@ThorneANN I've tested it in a real environment and I'm already using it. Because the test module has not been working properly in my local environment

@ThorneANN
Copy link
Copy Markdown
Contributor

ThorneANN commented Feb 27, 2026

looks good but no test

@ThorneANN I've tested it in a real environment and I'm already using it. Because the test module has not been working properly in my local environment

test it class without your local env ,just write some test code and use maven and docker to run test applition

@sd4324530
Copy link
Copy Markdown
Contributor Author

sd4324530 commented Feb 27, 2026

looks good but no test

@ThorneANN I've tested it in a real environment and I'm already using it. Because the test module has not been working properly in my local environment

test it class without your local env ,just write some test code and use maven and docker to run test applition

#4275
After this pr, I found that my local Docker environment is now ready for testing. I will add test cases as soon as possible.

@lvyanquan
Copy link
Copy Markdown
Contributor

MySQL field name case sensitivity should be independent of table name case sensitivity and should not be affected by
the lower_case_table_names configuration parameter.

@sd4324530
Copy link
Copy Markdown
Contributor Author

sd4324530 commented Mar 4, 2026

MySQL field name case sensitivity should be independent of table name case sensitivity and should not be affected by the lower_case_table_names configuration parameter.

@lvyanquan You're right. This issue was discussed in the DingTalk group before. The lower_case_table_names configuration option is used to control the case of table names, not fields. But the previous pr-4095 modified the incremental phase, causing inconsistencies between the full and incremental phases when fields are uppercase. After communication with @loserwang1024 , the conclusion was to first handle the full phase with the same rules, and then consider a unified solution later, such as converting all fields to lowercase. What do you think?

@loserwang1024
Copy link
Copy Markdown
Contributor

the conclusion was to first handle the full phase with the same rules, and then consider a unified solution later, such as converting all fields to lowercase.

@lvyanquan , yes, we first make the behavior of #4095 same, and then can move forward in another pr.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes MySQL pipeline connector snapshot/schema handling so that column (and PK) name casing is normalized consistently during full snapshot/schema emission when isTableIdCaseInsensitive is enabled.

Changes:

  • Add an uppercase_products fixture table (uppercase column names) to the MySQL inventory test DDL.
  • Expand IT coverage to run the existing alter-statement parsing test against both products and uppercase_products.
  • Normalize emitted schema column/PK names to lower-case in MySqlPipelineRecordEmitter when table IDs are case-insensitive, and plumb the flag from MySqlDataSource.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.../src/test/resources/ddl/inventory.sql Adds uppercase_products table + data to exercise uppercase column-name scenarios.
.../MySqlTableIdCaseInsensitveITCase.java Parameterizes the existing IT to validate schema-change parsing for both normal and uppercase-column tables.
.../MySqlPipelineITCase.java Excludes uppercase_products from an exclusion test to keep expected outputs stable.
.../MySqlDataSourceFactoryTest.java Updates expected discovered tables list to include uppercase_products.
.../MySqlPipelineRecordEmitter.java Lowercases column names and primary key column names when isTableIdCaseInsensitive is true.
.../MySqlDataSource.java Computes isTableIdCaseInsensitive once and passes it to both deserializer and record emitter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

(default,"hammer","14oz carpenter's hammer",0.875),
(default,"hammer","16oz carpenter's hammer",1.0),
(default,"rocks","box of assorted rocks",5.3),
(default,"jacket","water resistent black wind breaker",0.1),
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in inserted test data: "water resistent" is misspelled and should be "water resistant" (even if this is only test DDL, keeping strings correct avoids propagating typos across fixtures).

Copilot uses AI. Check for mistakes.
Comment on lines +269 to 273
String colName =
this.isTableIdCaseInsensitive
? column.name().toLowerCase(Locale.ROOT)
: column.name();
DataType dataType =
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new case-normalization logic for column names/primary keys is gated by isTableIdCaseInsensitive, but there doesn't appear to be test coverage for the bounded snapshot (StartupOptions.snapshot()) path where schemas are fetched via SHOW CREATE TABLE/DESC. Consider adding an IT that runs with StartupOptions.snapshot() against a table with uppercase column names (and PK) to exercise this end-to-end.

Copilot uses AI. Check for mistakes.
@sd4324530 sd4324530 force-pushed the hotfix-flink-38244 branch from 6b34990 to a5f8680 Compare March 5, 2026 14:52
…justed based on `isTableIdCaseInsensitive`.

Signed-off-by: peiyu <125331682@qq.com>
Signed-off-by: Pei Yu <125331682@qq.com>
Signed-off-by: Pei Yu <125331682@qq.com>
Signed-off-by: Pei Yu <125331682@qq.com>
Signed-off-by: Pei Yu <125331682@qq.com>
Signed-off-by: Pei Yu <125331682@qq.com>
Signed-off-by: Pei Yu <125331682@qq.com>
@sd4324530 sd4324530 force-pushed the hotfix-flink-38244 branch from 82b4ea6 to 89986fa Compare March 9, 2026 13:02
@sd4324530
Copy link
Copy Markdown
Contributor Author

The CI failure should be unrelated to this PR.

Copy link
Copy Markdown
Contributor

@ruanhang1993 ruanhang1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lvyanquan lvyanquan merged commit 363095e into apache:master Mar 11, 2026
27 of 28 checks passed
@sd4324530 sd4324530 deleted the hotfix-flink-38244 branch March 12, 2026 00:21
Mrart pushed a commit to Mrart/flink-cdc that referenced this pull request Mar 26, 2026
ThorneANN pushed a commit to ThorneANN/flink-cdc that referenced this pull request Mar 31, 2026
@patrickw1025
Copy link
Copy Markdown
Contributor

patrickw1025 commented May 6, 2026

@sd4324530

你好, 这次的 PR 引入了评论中提到的新的 BUG

我遇到的场景是:


Flink CDC 3.6.0 在跑 MySQL → 下游的 pipeline 任务时,如果同时满足:

  • 源 MySQL 的 lower_case_table_names != 0(即 1 或 2,macOS / Windows 默认;部分云上
    RDS 也这么配)
  • 源表 DDL 里有非全小写的列名(大写 / 驼峰均可)
  • YAML pipeline 配了 transform 块,且 projection / primary-keys 按 DDL
    原大小写引用了这些列

任务在 INITIAL snapshot 阶段就崩,0 行入库,错误堆栈是 Column 'XXX' not found in any
table,且故障 CreateTableEvent 的 schema 是空列 columns={} 但 primaryKeys
是有值的。任务进入崩溃循环,自动重启也不会自愈。

Bug 1 — 源端把列名静默小写化(3.6.0 引入的回归,对应 issue FLINK-38244

MySqlPipelineRecordEmitter.buildSchemaFromTable 里依据一个名为
isTableIdCaseInsensitive 的标志位决定要不要把列名 toLowerCase。这个标志位来自
Debezium MySqlConnection.isTableIdCaseSensitive(),在 lower_case_table_names != 0
时为 true。

这把 MySQL 的两个不同语义混了:

  • lower_case_table_names 只管表标识符(catalog / table)的存储与比较行为
  • 它不管列标识符:MySQL 列名一直是「比较时大小写不敏感、存储时按 DDL
    原样」,与该变量完全无关

结果是源端发出的 CreateTableEvent 里所有列名和 PK
名被静默改成小写。同样的逻辑还散落在 CustomAlterTableParserListener 的 6 处(影响
ALTER 阶段的 schema evolution)。

Bug 2 — PreTransform 过滤列时是大小写敏感的

TransformParser.generateReferencedColumns 的实现里,把 YAML projection
解析出的列名集合(HashSet)和 source schema 的列做 contains
比对。HashSet.contains 是 case-sensitive 的:

  • 集合里是 YAML 原样的大写 {TICKET, LOGIN, …}
  • 被比对的列是 Bug 1 改过的小写 ticket, login, …
  • 命中数为 0 → 返回空列表

后续 tableSchema.copy(emptyList) 把列删干净,PK 不动 → 这就是日志里 columns={},
primaryKeys=… 的来源。空 schema 喂给 PostTransform 里的
Calcite,第一个被引用的列就报 Column not found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants