-
Notifications
You must be signed in to change notification settings - Fork 2k
[Feature][Connector-JDBC] Fix Oracle BLOB data format preservation issue #9270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…o support large blob fields.
…o support large blob fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request fixes the Oracle BLOB data preservation issue by adding a configuration parameter (handle_blob_as_string) to control whether BLOB data is converted to a string or remains as bytes. The changes span multiple modules, including source conversion methods, dialect factories, config builders, and catalog creation for both JDBC and CDC connectors.
Reviewed Changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcFieldTypeUtils.java | Added special handling for BLOB conversion in getString(). |
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java | Configured catalog options to include handle_blob_as_string. |
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/source/JdbcSourceFactory.java | Propagated the new configuration parameter to the dialect loader. |
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/* | Updated OracleTypeMapper, OracleTypeConverter, OracleDialect, and OracleCatalog to support handle_blob_as_string. |
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/config/* | Updated configuration options and builders to include handle_blob_as_string. |
seatunnel-connectors-v2/connector-cdc/connector-cdc-oracle/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/oracle/utils/OracleTypeUtils.java | Adjusted conversion call for CDC connector with the new parameter. |
Files not reviewed (1)
- .github/actions/get-workflow-origin: Language not supported
@@ -69,6 +69,6 @@ public static org.apache.seatunnel.api.table.catalog.Column convertToSeaTunnelCo | |||
builder.scale(column.length()); | |||
} | |||
|
|||
return new OracleTypeConverter(false).convert(builder.build()); | |||
return new OracleTypeConverter(false, false).convert(builder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CDC OracleTypeUtils conversion hardcodes handleBlobAsString to false. Consider exposing this configuration so that the CDC connector's handling of BLOB data can be aligned with the JDBC connector if needed.
return new OracleTypeConverter(false, false).convert(builder.build()); | |
return new OracleTypeConverter(handleBlobAsString, false).convert(builder.build()); |
Copilot uses AI. Check for mistakes.
.github/actions/get-workflow-origin
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not update the version of action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will restore the previous version.
if (handleBlobAsString) { | ||
builder.dataType(BasicType.STRING_TYPE); | ||
builder.columnLength(BYTES_4GB - 1); | ||
log.info("Converted BLOB to STRING_TYPE with length: {}", BYTES_4GB - 1); | ||
} else { | ||
builder.dataType(PrimitiveByteArrayType.INSTANCE); | ||
builder.columnLength(BYTES_4GB - 1); | ||
log.info( | ||
"Converted BLOB to PrimitiveByteArrayType with length: {}", | ||
BYTES_4GB - 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use CLOB
to store string? Because blob was originally used to store bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your question.The design choice to handle BLOB data as strings is based on real customer use cases. We've encountered numerous situations where customers store XML files or other textual content in Oracle BLOB fields rather than CLOB, particularly in legacy systems or specific application scenarios.
When attempting to process these BLOB data directly in SQL queries using type conversions (like TO_CLOB), users often face byte size limitations, especially with large XML files. These limitations can result in data truncation or conversion failures.
This is why we implemented the handleBlobAsString option, providing users with flexibility during the ETL process:
- When the data is actually textual content (like XML), it can be converted to STRING_TYPE
- When the data is genuinely binary, it maintains its original binary form
This approach circumvents Oracle's internal type conversion limitations, offering more reliable processing capabilities for large textual data while maintaining backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…n when building ClickHouse Nodes (apache#9277)
…o support large blob fields.
…o support large blob fields.
|
Purpose of this pull request
#9268
This PR fixes an issue with the JDBC connector where Oracle BLOB data is not properly preserved during synchronization. Currently, when transferring BLOB fields (containing text, XML, HTML, etc.) from Oracle to target systems like Doris, the data is converted to Base64-encoded strings, making it unusable in its original format.
The implementation enhances the OracleTypeConverter to properly handle BLOB data based on the
handle_blob_as_string
configuration parameter:handle_blob_as_string=true
, BLOB data is treated as STRING type, preserving the original content formathandle_blob_as_string=false
(default), BLOB data is treated as BYTES type as beforeDoes this PR introduce any user-facing change?
Yes, this PR introduces a user-facing change in how Oracle BLOB data is handled during synchronization. Users can now configure the JDBC connector to preserve the original format of BLOB data by setting
handle_blob_as_string=true
in their connector configuration.Before this change, Oracle BLOB data containing structured content like XML or HTML would be converted to Base64-encoded strings in the target system, making it difficult to use. With this change, users can choose to preserve the original content format.
How was this patch tested?
handle_blob_as_string=true
For local testing verification, for example, when the parameter handle_blob_as_string=false or not set, the situation is as follows: In the Oracle source table (TEST_BLOB_TABLE), we have BLOB data with different content types:
Row 1: Simple text "Hello, World!"
Row 2: XML content
Row 3: HTML content
However, after synchronization to the Doris target table, all BLOB data is converted to Base64-encoded strings:
Row 1: "SGVsbG8sIFdvcmxkIQ=="
Row 2: "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4..."
Row 3: "PCFET0NUWVBFIGh0bWw+PGh0bWwgc3R5bGU9Im92..."
When the parameter is set to true, the BLOB fields in the Doris target table maintain data consistent with the source data.
Check list
New License Guide
release-note
.