[SEATUNNEL-10685] prevent timestamp_ntz from being saved as timestamp_ltz#10724
[SEATUNNEL-10685] prevent timestamp_ntz from being saved as timestamp_ltz#10724doyong365 wants to merge 2 commits intoapache:devfrom
Conversation
|
Hi, @doyong365! Thank you for your hard work. The changes regarding Oracle, MySQL, PostgreSQL, and SQL Server all look good. BTW, It seems we need to take a closer look at all connectors that use the timestamp_ntz type, like Snowflake
Please let me know if I am mistaken. |
|
Also, the PR title is truncated. It would be good to fix it to match the commit message. |
DanielLeens
left a comment
There was a problem hiding this comment.
Thank you for the patch. I pulled this PR locally and re-checked the latest code path.
The JDBC to Iceberg timestamp mapping itself looks correct to me: the updated JDBC converters now distinguish LTZ vs NTZ, and IcebergTypeMapper writes TIMESTAMP to withoutZone() and TIMESTAMP_TZ to withZone().
I still see one blocking issue before merge: this is a breaking change not only for the JDBC source side, but also for the Iceberg sink and downstream schema behavior. However, the current incompatible-change docs only describe the JDBC connector. Since the real runtime path changes the Iceberg table schema that gets written, I think the breaking-change note needs to explicitly mention the Iceberg-side impact and migration expectation as well.
Once that compatibility note is expanded in docs/en|zh/introduction/concepts/incompatible-changes.md, this PR will be much closer. Also, the current Build check shows ACTION_REQUIRED, so please make sure the workflow is actually runnable from the PR branch.
Nice work on the core fix, especially for a first contribution here. This is very close.
|
Please enable CI following the instructions. |
2f51ab9 to
74b5b1f
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
I pulled the current HEAD locally again.
The blocker from my previous review is resolved now: the incompatible-change note explicitly names both the JDBC source side and the Iceberg sink/schema impact, so the real user-facing behavior change is documented where I expected it.
I do not see a new blocking code/doc issue in the current revision. The only remaining step is letting the in-progress Build finish.
9b103f9 to
818ae30
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
I pulled the latest HEAD locally and re-checked the JDBC -> SeaTunnel -> Iceberg timestamp mapping path.
The new TIMESTAMP / TIMESTAMP_TZ split is correct for the general JDBC converters, but the Snowflake branch is still incomplete: connector-jdbc/.../SnowflakeTypeConverter.java:172-178 and .../catalog/snowflake/SnowflakeDataTypeConvertor.java:145-149 still map all Snowflake timestamp variants, including TIMESTAMP_LTZ / TIMESTAMP_TZ, to LOCAL_DATE_TIME_TYPE.
That means timezone-aware Snowflake timestamps still lose their zone information on the normal catalog/read path, so the fix is not end-to-end yet. Please bring the Snowflake converters into the same LOCAL_DATE_TIME vs OFFSET_DATE_TIME split before merge.
818ae30 to
90ac5ec
Compare
@och5351 Thank you for your reviewing I have updated The latest commit includes this end-to-end fix for Snowflake. PTAL! |
@dybyte I have checked instruction and enabled CI :) |
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update. I pulled the latest branch locally and rechecked the Snowflake conversion path. TIMESTAMP_NTZ now stays on the LocalDateTime path while TIMESTAMP_LTZ/TZ go through OffsetDateTime, and the reverse mapping is aligned in both converters. That resolves the functional mismatch I called out earlier. I don't see a new code blocker from this diff. The remaining red item is the Build workflow, so please rerun or inspect that check before merge.
|
Hi @doyong365! Thanks for your hard work on this! I'd like to share some detailed observations with the hope that we can grow together:
It might be better to align the implementation with the MySQL-style approach, like below: case MYSQL_DATETIME:
builder.dataType(LocalTimeType.LOCAL_DATE_TIME_TYPE);
builder.scale(typeDefine.getScale());
break;
case MYSQL_TIMESTAMP:
builder.dataType(LocalTimeType.OFFSET_DATE_TIME_TYPE);
builder.scale(typeDefine.getScale());
break;
The Kingbase implementation looks a bit complex. The PostgreSQL part seems to be mysql
oracleI couldn't find any timestamp ntz related handling for Oracle. sql server
Please let me know if you have any questions or if I've missed something! |
90ac5ec to
203f980
Compare
|
@och5351 Thank you so much for the detailed observations! Your insights were incredibly helpful.
|
DanielLeens
left a comment
There was a problem hiding this comment.
I pulled the latest head locally again and re-checked the updated JDBC -> SeaTunnel -> Iceberg timestamp path.
Current runtime chain:
JDBC ResultSet
-> dialect TypeConverter decides TIMESTAMP vs TIMESTAMP_TZ
-> AbstractJdbcRowConverter reads:
TIMESTAMP -> LocalDateTime
TIMESTAMP_TZ -> OffsetDateTime
-> downstream Iceberg mapping:
withoutZone() -> TIMESTAMP
withZone() -> TIMESTAMP_TZ
The latest commit after my previous review only extends that same LTZ/NTZ split to a few additional dialects (DM, DuckDB, Kingbase, OceanBase, Redshift, Xugu). I re-checked the current code path and I do not see a new blocking issue in this revision.
The incompatible-change note is already updated in both docs/en and docs/zh, which is what I wanted for this kind of user-visible behavior correction.
From a code-review perspective this looks mergeable to me. The only remaining step is letting the current Build check finish.
|
@davidzollo |
3e64e9d to
34d4517
Compare
34d4517 to
ab827c1
Compare
|
Hi, @yzeng1618 Thank you for the detailed review. PTAL ! |
|
Hi @doyong365, thanks for taking on this timestamp semantics fix. I reviewed the latest head locally on What This PR Fixes
Core Flow ReviewedFindingsI did not find a blocking code issue in the latest implementation. The important compatibility note is real and already documented: some downstream sinks that previously saw Compatibility
Performance And Side Effects
Tests And DocsThe PR adds broad type-mapping and serialization coverage, including JDBC dialect tests, Merge ConclusionConclusion: merge after fixes
Overall, this is a meaningful semantic correction with the right compatibility documentation. Once CI is green, I think it can move forward. |
ab827c1 to
1d81998
Compare
What This PR Fixes
Local Review BasisI rechecked the latest PR head Current Runtime ChainFindings
Conclusion: can mergeBlocking items:
Suggested non-blocking note:
Thank you for the quick follow-up. From a source-level review perspective, the latest hotfix still looks ready. |
1d81998 to
0c753e2
Compare
|
@davidzollo @yzeng1618 Root cause: After this PR, it is correctly mapped to This is the same structural limitation documented in:
All three are already annotated with Fix applied: Added The correct fix for Spark engine TIMESTAMP_TZ support is a separate concern (would require changing the encoding in I can address this in a follow-up PR, along with upgrading the supported Spark version beyond 3.3. |
|
Hi @doyong365, thanks for the clear follow-up explanation. I rechecked the latest head locally. What changed after Daniel's previous review
Runtime chain I recheckedFindings
Merge conclusionConclusion: merge after fixesBlocking item:
Non-blocking note:
From a source-level review perspective, the latest follow-up looks reasonable and consistent with the already-reviewed |
|
Hi @doyong365, I rechecked the current head locally again after the latest review activity. My conclusion on the source-level logic is unchanged:
Runtime chain I am still using for this judgment Remaining issue:
Conclusion: merge after fixesBlocking item:
From a source-level review perspective, I do not currently see a reopened code blocker. |
0c753e2 to
2c4df4c
Compare
|
Hi @doyong365, thanks for continuing to work on this timestamp issue. I pulled the current head locally again and rechecked the full diff against What this PR is trying to solve
Runtime / scope chain I checked Merge conclusion Conclusion: do not merge yet
I’m happy to re-review once the branch is cleaned up. The original timestamp problem is worth fixing, but the current mixed diff is the blocker from Daniel’s side. |
…_ltz Co-authored-by: Chanhae Oh <dhcksgo5319@gmail.com> Co-authored-by: yzeng1618 <yzeng1618@gmail.com>
…e Timestamps Co-authored-by: Chanhae Oh <dhcksgo5319@gmail.com>
2c4df4c to
1768be3
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the latest follow-up. I pulled the current head locally again and rechecked it against upstream/dev.
The main blocker from Daniel's previous review is still unresolved: this PR is still no longer reviewable as one isolated timestamp fix.
Current scope I verified locally:
git diff --stat upstream/dev...HEAD
-> 74 files changed
-> spans cdc-base, jdbc, doris, hudi, iceberg, paimon, starrocks,
multiple JDBC/Iceberg/StarRocks E2E modules, and incompatible-change docs
latest commit
-> OracleTypeConverter hotfix
-> does not reduce the branch back to a focused timestamp-only surface
So even though the original timestamp problem is worth fixing, the current branch still carries too much unrelated surface area for me to give a safe merge decision on "the timestamp fix" itself.
Blocking item:
- Please rebuild this PR on top of the latest
devand keep only the timestamp-related production changes plus directly related tests/docs.
The current Build is also still queued, but the mixed review scope is already enough to block merge from Daniel's side.
|
@davidzollo CC. @yzeng1618 @och5351 |
DanielLeens
left a comment
There was a problem hiding this comment.
CI is green now, which is great progress. I rechecked the current head locally against upstream/dev instead of carrying forward my earlier scope-only concern, and I want to narrow the blocker down to one concrete runtime issue.
What this PR is solving
- User pain: timezone-aware columns could still collapse into plain
TIMESTAMPsemantics somewhere between JDBC / CDC sources and downstream sinks. - Fix approach: map timezone-aware database columns to SeaTunnel
TIMESTAMP_TZ, carry them asOffsetDateTime, and update downstream sink / format boundaries accordingly. - One-line value: this is a cross-boundary timestamp-semantics fix, not just a single JDBC dialect tweak.
Runtime chain I rechecked locally
JDBC / CDC source read
-> dialect TypeConverter.convert(...)
-> timezone-aware DB type -> TIMESTAMP_TZ
-> AbstractJdbcRowConverter.toInternal()
-> JdbcFieldTypeUtils.getOffsetDateTime(...)
-> Debezium / TiDB CDC converters
-> OffsetDateTime is produced for TIMESTAMP_TZ
-> SeaTunnelRow carries OffsetDateTime
Downstream boundaries
-> IcebergTypeMapper / DefaultDeserializer
-> withZone() <-> TIMESTAMP_TZ
-> Hudi / Paimon / JSON / Text / StarRocks
-> TIMESTAMP_TZ handling branches were added
Xugu sink write path
-> JdbcSink writer
-> XuguJdbcRowConverter.setValueToStatementByDataType()
-> OffsetDateTime.toLocalDateTime()
-> PreparedStatement.setTimestamp(...)
-> timezone offset is silently dropped
Docs contract
-> docs/en|zh incompatible-changes
-> says timezone semantics are now preserved
-> calls out StarRocks / Hudi / CDC specifics
-> does not call out the lossy Xugu sink fallback
Current findings
- After rechecking the current real diff, I do not want to block on branch size alone. The remaining 74-file surface is still broadly tied to the same timestamp-semantics topic.
- The merge blocker is more specific:
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/xugu/XuguJdbcRowConverter.java:47-54still convertsOffsetDateTimetoLocalDateTimeand writes a plainTimestamp. - That means the normal Xugu sink path still silently loses the offset for
TIMESTAMP_TZvalues. - At the same time,
docs/en/introduction/concepts/incompatible-changes.md:10-18and the matching Chinese section describe this PR as preserving timezone semantics and already list connector-specific caveats for StarRocks and Hudi, but not Xugu.
Why this still blocks merge
- This is not an unreachable fallback branch. It is the normal JDBC sink write path for Xugu.
- So a value like
2026-01-01T09:00:00+08:00can still be written with the offset already stripped. - That is a silent data-semantics problem, which is more dangerous than an explicit unsupported-type failure.
Recommended fix direction
- Option A (preferred): fail fast for Xugu
TIMESTAMP_TZsink writes until the driver can preserve the value correctly, and document that limitation in bothdocs/enanddocs/zh. - Option B: if the community deliberately keeps the lossy fallback, then please document Xugu exactly the same way you already documented the StarRocks/Hudi connector-specific behavior, and soften the current wording that says timezone semantics are accurately preserved.
Conclusion
Conclusion: merge after fixes
Blocking item:
- Xugu sink still silently drops the
TIMESTAMP_TZoffset on the normal write path, and the current incompatible-change docs do not disclose that connector-specific degradation.
Non-blocking note:
- I am not carrying forward a generic “too many files” blocker in this round. CI is green, and after rechecking the current diff locally, the blocker I still stand behind is the Xugu semantics/documentation mismatch above.
Thank you again for pushing this through. The main timestamp direction is valuable and much closer now; this one Xugu path is the remaining reason I would still keep it in “fixes needed” state.








Purpose of this pull request
Fixes #10685
JDBC connector could not distinguish between
TIMESTAMP(NTZ, No Time Zone) andTIMESTAMP_TZ(LTZ, Local Time Zone) types when reading from databases. Previously, both timezone-naive and timezone-aware timestamp columns were mapped to SeaTunnel's internalTIMESTAMPtype, causing timezone information to be lost when writing to timezone-aware sinks such as Iceberg. This PR explicitly separates them by fixing theTypeConverterandJdbcRowConverterfor MySQL, PostgreSQL, Oracle, and SQL Server:DATETIMETIMESTAMPTIMESTAMP(NTZ) ✅TIMESTAMPTIMESTAMPTIMESTAMP_TZ(LTZ) ✅timestampTIMESTAMPTIMESTAMP(NTZ) ✅timestamptzTIMESTAMPTIMESTAMP_TZ(LTZ) ✅TIMESTAMPTIMESTAMPTIMESTAMP(NTZ) ✅TIMESTAMP WITH LOCAL TIME ZONETIMESTAMPTIMESTAMP_TZ(LTZ) ✅datetimeTIMESTAMPTIMESTAMP(NTZ) ✅datetimeoffsetTIMESTAMPTIMESTAMP_TZ(LTZ) ✅Also fixes
IcebergTypeMapperandDefaultDeserializerinconnector-icebergto correctly mapTIMESTAMP_TZ→ Icebergtimestamptz(withZone).Does this PR introduce any user-facing change?
Yes.
Previous behavior:
Both
DATETIME(NTZ) andTIMESTAMP(LTZ) columns from MySQL were read as SeaTunnel'sTIMESTAMPtype. When written to Iceberg, both were stored astimestamp(without timezone), losing timezone semantics.New behavior:
DATETIME→ SeaTunnelTIMESTAMP→ Icebergtimestamp(without timezone)TIMESTAMP→ SeaTunnelTIMESTAMP_TZ→ Icebergtimestamptz(with timezone)(Note: These breaking changes have been documented in
incompatible-changes.md.)Affected engines: Zeta, Flink (Spark is excluded — see note below)
How was this patch tested?
Existing unit tests were updated to reflect the new NTZ/LTZ type separation:
MySqlTypeConverterTest— verifiesDATETIME→TIMESTAMP,TIMESTAMP→TIMESTAMP_TZOracleTypeConverterTest— verifies NTZ/LTZ split for Oracle timestamp typesSqlServerTypeConverterTest— verifies NTZ/LTZ split for SQL Server timestamp typesJdbcFieldTypeUtilsTest— verifiesgetLocalDateTime()andgetOffsetDateTime()helpersOracleCreateTableSqlBuilderTest— updated DDL expectation to match new NTZ mappingIcebergTypeMapperTest— verifiesTIMESTAMP→withoutZone(),TIMESTAMP_TZ→withZone()Run with:
./mvnw test -Dskip.spotless=true
-pl seatunnel-connectors-v2/connector-jdbc,seatunnel-connectors-v2/connector-iceberg
--no-transfer-progress
New E2E tests were added to verify the fix end-to-end using real database containers. The Assert sink's field_type rule is used to strictly validate the SeaTunnel internal type. If the type mapping is wrong, the Assert sink throws a field type mismatch error and fails the build.
Pre-requisite — build connectors first:
./mvnw clean install -DskipTests -Dskip.spotless=true
-pl seatunnel-connectors-v2/connector-jdbc,seatunnel-connectors-v2/connector-iceberg
-am --no-transfer-progress
MySQL (NTZ/LTZ mapping):
RUN_ALL_CONTAINER=false RUN_ZETA_CONTAINER=true
./mvnw -pl seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-1
-DskipUT -DskipIT=false -Dit.test="JdbcMysqlTimestampIT"
-Dskip.spotless=true verify --no-transfer-progress
Expected BUILD SUCCESS — Assert sink verifies:
PostgreSQL (NTZ/LTZ mapping):
RUN_ALL_CONTAINER=false RUN_ZETA_CONTAINER=true
./mvnw -pl seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-3
-DskipUT -DskipIT=false -Dit.test="JdbcPostgresTimestampIT"
-Dskip.spotless=true verify --no-transfer-progress
Expected BUILD SUCCESS — Assert sink verifies:
MySQL → Iceberg (end-to-end type persistence):
RUN_ALL_CONTAINER=false RUN_ZETA_CONTAINER=true
./mvnw -pl seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-e2e
-DskipUT -DskipIT=false -Dit.test="JdbcToIcebergTimestampIT#testMysqlDatetimeToIcebergNtz"
-Dskip.spotless=true verify --no-transfer-progress
Expected BUILD SUCCESS — Iceberg metadata JSON confirms:
Check list
New License Guide (https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)