[Feature][Connector-V2][Oracle-CDC] Add SCN specific startup#11171
[Feature][Connector-V2][Oracle-CDC] Add SCN specific startup#11171goutamadwant wants to merge 4 commits into
Conversation
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this! I reviewed the full current diff, traced the Oracle CDC startup path into the shared CDC startup contract, and I don't think the current implementation is merge-ready yet.
What this PR solves
- User pain: Oracle CDC users may want to start from a specific SCN instead of only
initial,latest, or timestamp-based startup. - Fix approach: add a new Oracle-specific
startup.specific-offset.scnoption and translate it into the startup config used byOracleIncrementalSource. - One-line summary: the feature direction is useful, but the current head does not actually fit the shared CDC startup contract that the Oracle source still depends on.
Runtime chain I checked
Oracle CDC source construction
-> OracleIncrementalSource.buildStartupConfig(...) [OracleIncrementalSource.java:118-135]
-> reads `startup.specific-offset.scn`
-> builds a RedoLogOffset-shaped map
-> returns new StartupConfig(startupMode, specificOffset)
Shared CDC startup contract
-> StartupConfig [connector-cdc-base/.../StartupConfig.java:30-55]
-> stores only {specificOffsetFile, specificOffsetPos, timestamp}
-> SPECIFIC path calls offsetFactory.specific(specificOffsetFile, specificOffsetPos)
Findings
Issue 1: the new Oracle-specific SCN startup path does not match the shared StartupConfig contract
- Location:
seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/oracle/source/OracleIncrementalSource.java:131-135,seatunnel-connectors-v2/connector-cdc/connector-cdc-base/src/main/java/org/apache/seatunnel/connectors/cdc/base/config/StartupConfig.java:30-55 - Why this is a problem:
The Oracle path now tries to hand aMap<String, String>-style redo-log offset intoStartupConfig, but the shared base class still modelsSPECIFICstartup only asfile + pos. Its runtime path still delegates tooffsetFactory.specific(specificOffsetFile, specificOffsetPos). - Risk:
On the current head, the new option is not actually wired through a coherent startup contract. At best this is incomplete; at worst it leaves the source in a non-compilable or non-runnable state depending on how the constructor mismatch is resolved. - Better fix:
Either extend the base startup/offset contract explicitly so Oracle can carry a structured specific offset end to end, or keep the SCN-specific handling inside the Oracle-specific path instead of forcing it through the genericfile + posstartup model. - Severity: High
- Already raised by others: No
Issue 2: the new option surface implies a working Oracle-specific offset shape, but the shared runtime still only understands generic specific offsets
- Location:
seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/oracle/source/OracleIncrementalSourceOptions.java:62-66,seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/oracle/source/offset/RedoLogOffset.java:36-55 - Why this is a problem:
The PR addsstartup.specific-offset.scnand builds aRedoLogOffset-shaped map, but there is still no matching shared startup path that consumes{scn, commit_scn, lcr_position}as the normalSPECIFICstartup input. - Risk:
Users get a new documented config surface without a proven end-to-end runtime contract behind it. - Better fix:
Make the Oracle startup contract explicit end to end, then add a regression test that proves a specific-SCN startup reaches the runtime offset factory correctly. - Severity: High
- Already raised by others: No
Compatibility
- API/config names: additive only.
- Behavior: not safe to claim as compatible yet, because the new startup mode is not fully connected to the shared CDC startup runtime.
- Migration: not applicable until the implementation is complete.
Performance and side effects
- No meaningful CPU / memory concern from the option itself.
- The real side effect is correctness: a startup option that looks supported but is not fully wired can fail at source startup or during offset construction.
Tests and docs
- Any tests and docs added around SCN startup are not enough until the runtime startup contract itself is made coherent.
- I would strongly recommend a focused regression test that proves
startup.mode = specificplus SCN reaches the real Oracle offset startup path.
CI
- The latest
Buildis red, but the runtime-contract gap above is already a source-level blocker regardless of CI.
Merge conclusion
Conclusion: can merge after fixes
- Blocking items
- Issue 1: complete the Oracle-specific startup contract instead of routing a structured SCN offset through a shared
StartupConfigmodel that still only understandsfile + pos. - Issue 2: prove the new SCN startup option through a real end-to-end startup regression test.
- Suggested follow-up
- Once the contract is settled, keep the docs explicit about exactly which Oracle startup modes are supported and how they map to redo-log offsets.
Happy to take another pass once the Oracle-specific offset shape is wired through a consistent startup path.
|
The Oracle-specific SCN startup path looks much clearer than documenting file/position offsets for Oracle. One compatibility point to call out: this PR also narrows |
|
Thanks, this is a fair compatibility point to raise. I agree that if this PR continues narrowing That said, on the current head my main blocker is still the earlier runtime-contract gap: the SCN-specific structured offset is not wired through the shared |
|
Hi @DanielLeens @davidzollo I updated the PR based on the review. Oracle SCN startup now keeps the SCN as a structured specific offset through I also added a focused check that Oracle CDC only exposes Verified with: ./mvnw -q -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-base,seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle -Dtest=OracleIncrementalSourceFactoryTest -DfailIfNoTests=false -DskipITs test
./mvnw -q -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-base,seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle -DskipTests -DskipITs package |
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the update, and thanks for following up with the focused startup-contract changes. I re-reviewed the latest head locally from the full Oracle CDC startup path again instead of just looking at the incremental patch, and the main Daniel blocker from the previous round is now closed on the current revision.
What this PR fixes:
- User pain: Oracle CDC users need a real way to start from a specific Oracle SCN, but the previous path was still shaped around generic file/position-specific startup semantics.
- Fix approach: keep Oracle-specific SCN startup as a structured offset all the way through
StartupConfig.specificOffset(Map)intoRedoLogOffsetFactory.specific(Map), and tighten validation sospecificmode cannot silently fall back to file/position inputs. - In one sentence: this PR turns Oracle-specific startup from a documentation-level idea into a real startup-offset path that reaches the runtime offset factory correctly.
Runtime call chain I checked:
Oracle CDC source construction
-> IncrementalSource.<init>() [IncrementalSource.java:118-129]
-> getStartupConfig(readonlyConfig)
-> OracleIncrementalSource.getOracleStartupConfig(...) [OracleIncrementalSource.java:92-136]
-> if startup.mode != specific:
-> keep the existing StartupConfig(startupMode, file, pos, timestamp) path
-> if startup.mode == specific:
-> reject startup.specific-offset.file / pos
-> require startup.specific-offset.scn > 0
-> build {scn, commit_scn, lcr_position} map
-> StartupConfig.specificOffset(map)
Runtime offset materialization
-> StartupConfig.getStartupOffset(offsetFactory) [StartupConfig.java:68-86]
-> SPECIFIC with structured offset
-> offsetFactory.specific(Map)
-> RedoLogOffsetFactory.specific(Map) [RedoLogOffsetFactory.java:64-67]
-> new RedoLogOffset(offset)
Key findings from the current head:
- The earlier runtime-contract gap is fixed. Oracle-specific startup is no longer pretending to fit the generic
file + pospath; it now uses the structuredspecific(Map)branch end to end. - The factory validation now matches the runtime contract:
startup.mode = specificrequiresstartup.specific-offset.scn- the SCN must be
> 0 - file/position-specific offsets are explicitly rejected for Oracle-specific startup
- The new test coverage is pointed at the real contract:
OracleIncrementalSourceFactoryTest.java:62-134covers missing SCN, invalid SCN, wrong mode, and file/position rejection.OracleIncrementalSourceFactoryTest.java:85-108proves the structured Oracle offset survivesIncrementalSplit -> IncrementalSplitState -> toSourceSplit()without losing the SCN payload.
- Related to @hiSandog's earlier compatibility point about
stop.mode: I recheckedorigin/dev, andOracleIncrementalSourceOptions.STOP_MODEwas already a single-choiceneverthere. So on the latest head, the doc/test update is aligning with the existing runtime boundary rather than introducing a new stop-mode restriction. - I do not see API, config-default, protocol, or backward-compatibility regressions in the current implementation.
Test stability:
- Stable.
- The added tests are pure config / object-level unit tests with no sleeps, no containers, no network, and no time-sensitive async behavior.
CI note:
- The current
Buildfailure does not look like an Oracle CDC code regression. - I checked the failing fork run behind the check, and the failure is in the website job during
node tools/generate-team-pr-contributors.jswith:GitHub token not found. Set GITHUB_TOKEN/GH_TOKEN or run gh auth login first. - So from Daniel's side, this looks like a workflow/token issue on the fork build path rather than a blocker reopened by the Oracle source changes themselves.
Merge conclusion:
- Conclusion: mergeable from Daniel's side.
- Blocking code issues: none in the current revision.
- Non-blocking suggestions: none.
- Remaining GitHub item: the failing
Buildshould still be rerun/fixed on the CI side before the PR actually lands, but I do not see evidence that the Oracle CDC code in this PR is the cause.
Overall, the Oracle-specific startup contract now looks coherent end to end, and the latest head closes the main blocker from my previous review.
|
Thanks for the update. I rechecked the current head against the latest
Because the PR is still behind the latest base, any CI result and merge gate signal here is mixed with upstream drift. The lowest-cost next step is to sync with the latest |
|
Thanks for the update. I rechecked the current head against the latest
The current failing CI signal does not look actionable on top of this stale head yet, so the lowest-cost next step is to sync with the latest |
|
@DanielLeens Synced this branch with latest dev and pushed the rebased branch. Local checks:
I checked that parser failure against clean origin/dev as well and it fails there too, so that one does not look introduced by this PR. |
|
Thanks for the rebase update. From Daniel's side, the current source revision is still the same Oracle CDC head I already approved ( What still needs to be cleared before merge is the current GitHub |
|
Thanks for the earlier rebase update. I rechecked the current head against the latest
From Daniel's side, I still do not see a reopened Oracle-CDC source blocker on the current branch. But GitHub is now reporting merge conflicts against the latest |
Signed-off-by: goutamadwant <workwithgoutam@gmail.com> # Conflicts: # docs/en/connectors/source/Oracle-CDC.md # docs/zh/connectors/source/Oracle-CDC.md
|
Synced this branch with latest dev and resolved the Oracle CDC docs conflict. I kept the latest dev wording for the common Oracle CDC options and kept this PR's SCN specific startup option docs. The startup docs still match the runtime contract where Oracle specific startup uses startup.specific-offset.scn. Local Oracle CDC focused test and package check pass with Java 11. |
Purpose of this pull request
Fixes #11041.
This PR adds Oracle CDC support for starting from a user-provided SCN with:
The configured SCN is mapped into the existing Oracle redo-log offset model, so the incremental split uses the same offset shape as the current latest and timestamp startup paths. Invalid combinations fail early, including missing SCN, non-positive SCN, or file/position offsets with Oracle specific startup.
The Oracle CDC docs are also updated to describe SCN based specific startup instead of binlog file/position options.
Does this PR introduce any user-facing change?
Yes. Oracle CDC users can now configure SCN based specific startup with startup.specific-offset.scn.
The previous Oracle CDC docs mentioned binlog file/position specific startup, but the Oracle connector does not use binlog file/position offsets. This PR documents and validates the Oracle-specific SCN config instead.
If the configured SCN is no longer available to the selected Oracle log mining backend, startup should fail from the normal Oracle/Debezium recovery path instead of silently falling back to another startup mode.
How was this patch tested?
Added unit coverage for:
Verified locally with:
JAVA_HOME=$(/usr/libexec/java_home -v 11) PATH="$JAVA_HOME/bin:$PATH" ./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle -Dtest=OracleIncrementalSourceFactoryTest test
JAVA_HOME=$(/usr/libexec/java_home -v 11) PATH="$JAVA_HOME/bin:$PATH" ./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle -DfailIfNoTests=false package
Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.