[fix] support staged doris insert overwrite#660
Open
liujiwen-up wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Issue Number: close #xxx
Problem Summary:
This PR changes Doris Flink Connector
INSERT OVERWRITEsink behavior to avoid truncating the target table before data is successfully written.Previously,
INSERT OVERWRITEexecutedTRUNCATE TABLEbefore writing. If the Flink job failed after truncate but before the write completed, the target table could be left empty.The new implementation uses a staging-table based flow:
CREATE TABLE staging LIKE target.ALTER TABLE target REPLACE WITH TABLE staging PROPERTIES('swap'='false').This implementation is currently limited to bounded
INSERT OVERWRITEwithSTREAM_LOADand 2PC enabled. It rejects unsafe configurations such as streaming overwrite, non-Stream Load write modes,sink.ignore.commit-error=true, missingsink.label-prefix, missingjdbc-url, and pre-existing staging tables.Additional guards were added to require Doris table id metadata from
information_schema.metadata_name_ids, so the connector can detect target-table changes before finalization and identify already-finalized overwrite attempts.Checklist(Required)
Further comments
This change intentionally chooses a conservative first version:
Alternatives considered include continuing to use
TRUNCATE TABLE, reusing existing staging tables during recovery, or supporting more write modes immediately. These were not chosen because they either preserve the original data-loss risk or make failure/recovery semantics harder to prove safe.Validation performed:
mvn -Pflink1 -pl flink-doris-connector-base -Dtest=TestDorisOverwriteManager test mvn -Pflink1 -pl flink-doris-connector-flink1 -am -DskipTests compile mvn -Pflink2 -pl flink-doris-connector-flink2 -am -DskipTests compile