-
Notifications
You must be signed in to change notification settings - Fork 2k
[Feature][connector-doris] adds case insensitivity feature to the Doris connector #9273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a case insensitivity feature to the Doris connector by introducing the "case_sensitive" configuration option and propagating its effect across components.
- Updated DorisStreamLoad to conditionally lowercase table names.
- Refactored SeaTunnelRowSerializer and its factory to process field names based on case sensitivity.
- Adjusted DorisTypeConverters, DorisTableConfig, and configuration files to support the new option.
Reviewed Changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
DorisStreamLoad.java | Updates table name assignment based on the case_sensitive flag. |
DorisSinkWriter.java | Refactors serializer creation to improve abstraction via a factory. |
SeaTunnelRowSerializerFactory.java | Passes the case_sensitive flag to the serializer. |
SeaTunnelRowSerializer.java | Processes field names according to the case_sensitive flag. |
DorisTypeConverterV2.java / DorisTypeConverterV1.java | Delegates column name handling based on case sensitivity. |
AbstractDorisTypeConverter.java | Introduces a new method for building physical columns that applies case conversion if needed. |
DorisTableConfig.java | Adjusts database and table names based on the case_sensitive flag. |
DorisSinkOptions.java | Adds the CASE_SENSITIVE configuration option. |
DorisSinkConfig.java | Reads and sets the case_sensitive configuration from user input. |
Files not reviewed (1)
- .github/actions/get-workflow-origin: Language not supported
Comments suppressed due to low confidence (1)
seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/config/DorisTableConfig.java:87
- [nitpick] Consider refactoring the lowercasing logic into a dedicated utility method to avoid duplication and improve maintainability.
if (!caseSensitive) { dorisTableConfig.setDatabase(dorisTableConfig.getDatabase().toLowerCase()); dorisTableConfig.setTable(dorisTableConfig.getTable().toLowerCase()); }
List<Object> fieldNames = new ArrayList<>(Arrays.asList(seaTunnelRowType.getFieldNames())); | ||
this.caseSensitive = caseSensitive; | ||
|
||
String[] fieldNames = seaTunnelRowType.getFieldNames(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Add a brief comment to explain why we are converting field names based on the case_sensitive flag to aid future maintainers.
String[] fieldNames = seaTunnelRowType.getFieldNames(); | |
String[] fieldNames = seaTunnelRowType.getFieldNames(); | |
// Normalize field names based on the caseSensitive flag. | |
// If caseSensitive is false, convert field names to lowercase to ensure consistent handling | |
// in case-insensitive environments. |
Copilot uses AI. Check for mistakes.
Thanks @yzeng1618 . Please add test case and open ci on your fork repository. https://github.com/apache/seatunnel/pull/9273/checks?check_run_id=41710620403 |
Why not use https://seatunnel.apache.org/docs/2.3.10/transform-v2/table-rename and https://seatunnel.apache.org/docs/2.3.10/transform-v2/field-rename to change name to lowercase? |
1. Complexity in Scenarios with Numerous FieldsFor tables containing hundreds of fields, using transform plugins for field renaming becomes extremely cumbersome. Each field requires individual configuration, resulting in verbose and difficult-to-maintain configuration files. For example: transform {
FieldRename {
source_table_name = "table1"
result_table_name = "table1"
field_name = {
FIELD1 = "field1"
FIELD2 = "field2"
// ... potentially hundreds of fields
FIELD500 = "field500"
}
}
} 2. Complexity in Multi-Table/Full Database Synchronization ScenariosIn multi-table or full database synchronization scenarios, using transform plugins becomes even more complex. Each table requires separate TableRename and FieldRename configurations, leading to extremely large configuration files. Additionally, if table structures change (e.g., adding new fields), the configuration files must be updated accordingly. 3. Flexibility of Code-Level OptimizationHandling case sensitivity at the code level provides greater flexibility and functional extensibility: Enables automatic table creation functionality when case insensitivity parameters are set (currently developed, pending evaluation) 4、Consistency with Other ConnectorsThe Iceberg connector also provides parameters to control case sensitivity, providing a unified user experience. |
In fact, seatunnel supports using one transform to solve the case conversion problem for all tables and all fields. For example FieldRename {
plugin_input = "transform1"
plugin_output = "transform2"
table_match_regex = ".*"
convert_case = "LOWER"
} Please refer https://seatunnel.apache.org/docs/2.3.10/transform-v2/transform-multi-table
This shouldn't be a problem since the field case is according to upstream. |
Thank you for suggesting the use of transform-multi-table. We acknowledge that we hadn't fully considered this solution before, and it is indeed an excellent option. However, we still believe that adding a case sensitive parameter to the Doris connector has its unique value and necessity:
Therefore, for simple scenarios, using connector parameters would be ideal; for more complex transformation requirements, the transform-multi-table functionality would be more appropriate. This approach provides users with maximum flexibility while maintaining configuration simplicity. We kindly request your evaluation of the necessity of adding the case-sensitive parameter to the Doris connector. |
Purpose of this pull request
#9272
This PR adds case insensitivity feature to the Doris connector. During data synchronization, especially in migration scenarios from Oracle to Doris, column name matching issues often occur because Oracle stores table and field names in uppercase by default, while Doris typically uses lowercase identifiers. By adding a case_sensitive configuration option, users can control whether column names are case-sensitive, thus resolving case difference issues during cross-database system migration.
Does this PR introduce any user-facing change?
Yes, this PR introduces a new configuration option case_sensitive that allows users to control whether the Doris connector is case-sensitive when processing column names. When set to false , the connector automatically converts column names to lowercase, achieving case-insensitive column name matching. This is particularly useful when migrating data from databases like Oracle that use uppercase identifiers by default to Doris.
How was this patch tested?
Check list
New License Guide
release-note
.