Skip to content

[Feature][connector-jdbc] Fix Oracle BLOB data corruption in JDBC connector #9268

Open
@yzeng1618

Description

@yzeng1618

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

The JDBC connector currently fails to properly preserve the original content when processing BLOB fields from Oracle databases. This issue is clearly demonstrated in the provided example:

In the Oracle source table (TEST_BLOB_TABLE), we have BLOB data with different content types:

  • Row 1: Simple text "Hello, World!"

  • Row 2: XML content

  • Row 3: HTML content
    However, after synchronization to the Doris target table, all BLOB data is converted to Base64-encoded strings:

  • Row 1: "SGVsbG8sIFdvcmxkIQ=="

  • Row 2: "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4..."

  • Row 3: "PCFET0NUWVBFIGh0bWw+PGh0bWwgc3R5bGU9Im92..."
    This transformation makes the data unusable in its original form. Users cannot directly work with the text, XML, or HTML content as they could in the source database. Instead, they would need to perform additional Base64 decoding steps to retrieve the original content.

Usage Scenario

This feature is essential for users who need to accurately transfer Oracle BLOB data to target systems while preserving the original content format. Specific scenarios include:

  1. Data Migration Projects : When migrating databases containing BLOB fields with text, XML, HTML, or other structured content from Oracle to systems like Doris, users need the original content to remain usable.
  2. Document Management Systems : Organizations storing documents (HTML, XML, JSON) in Oracle BLOB fields need to maintain the document structure during data synchronization.
  3. Application Integration : When applications rely on specific data formats stored in BLOB fields, the integrity of these formats must be preserved during data transfer.
  4. Data Analysis : Analysts working with structured data stored in BLOB fields need the original format for proper analysis rather than encoded strings.
    Without this feature, users must implement additional post-processing steps to decode and reconstruct the original data, significantly complicating their data pipelines.

Related issues

no

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions