-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[Enhancement](paimon)support native read paimon top level schema change table. #48723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32681 ms
|
TPC-DS: Total hot run time: 192362 ms
|
ClickBench: Total hot run time: 31.4 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run buildall |
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32633 ms
|
TPC-DS: Total hot run time: 192050 ms
|
ClickBench: Total hot run time: 31.36 s
|
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32645 ms
|
TPC-DS: Total hot run time: 184942 ms
|
ClickBench: Total hot run time: 30.6 s
|
run buildall |
TeamCity cloud ut coverage result: |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
@@ -38,6 +38,134 @@ PaimonReader::PaimonReader(std::unique_ptr<GenericReader> file_format_reader, | |||
ADD_CHILD_TIMER(_profile, "DeleteFileReadTime", paimon_profile); | |||
} | |||
|
|||
/** | |||
sql: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need this comment
|
||
public class PaimonSchemaCacheValue extends SchemaCacheValue { | ||
|
||
private List<Column> partitionColumns; | ||
|
||
public PaimonSchemaCacheValue(List<Column> schema, List<Column> partitionColumns) { | ||
private TableSchema tableSchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we only use this TableSchema
to build columnIdToName
, no need to store it.
@@ -154,6 +155,11 @@ protected Optional<String> getSerializedTable() { | |||
return Optional.of(serializedTable); | |||
} | |||
|
|||
Map<Long, String> getSchemaInfo(Long schemaId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map<Long, String> getSchemaInfo(Long schemaId) { | |
private Map<Long, String> getSchemaInfo(Long schemaId) { |
@@ -1273,7 +1273,7 @@ Status FileScanner::_init_expr_ctxes() { | |||
if (slot_info.is_file_slot) { | |||
_file_slot_descs.emplace_back(it->second); | |||
_file_col_names.push_back(it->second->col_name()); | |||
if (it->second->col_unique_id() > 0) { | |||
if (it->second->col_unique_id() >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, iceberg's field unique ID starts from 1, paimon/hudi field unique ID starts from 0.
run buildall |
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32806 ms
|
TPC-DS: Total hot run time: 192295 ms
|
ClickBench: Total hot run time: 31.35 s
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
…able. (#49051) ### What problem does this PR solve? Similar to pr #48723 Problem Summary: 1. Supports native reader reading tables after the top-level schema of hudi is changed, but does not support tables after the internal schema of struct is changed. change internal schema of struct schema(not support, will support in the next PR). 2. Unify the logic of iceberg/paimon/hudi native reader to handle schema change's table.
…ge table. (apache#48723) ### What problem does this PR solve? Problem Summary: Supports native reader reading tables after the top-level schema of paimon is changed, but does not support tables after the internal schema of struct is changed. change top-level schema(support): ```sql --spark sql ALTER TABLE table_name ADD COLUMNS (c1 INT,c2 STRING); ALTER TABLE table_name RENAME COLUMN c0 TO c1; ALTER TABLE table_name DROP COLUMNS (c1, c2); ALTER TABLE table_name ADD COLUMN c INT FIRST; ALTER TABLE table_name ADD COLUMN c INT AFTER b; ALTER TABLE table_name ALTER COLUMN col_a FIRST; ALTER TABLE table_name ALTER COLUMN col_a AFTER col_b; ``` change internal schema of struct schema(not support, will support in the next PR): ```sql --spark sql ALTER TABLE table_name ADD COLUMN v.value.f3 STRING; ALTER TABLE table_name RENAME COLUMN v.f1 to f100; ALTER TABLE table_name DROP COLUMN v.value.f3 ; ALTER TABLE table_name ALTER COLUMN v.col_a FIRST; ```
…able. (apache#49051) ### What problem does this PR solve? Similar to pr apache#48723 Problem Summary: 1. Supports native reader reading tables after the top-level schema of hudi is changed, but does not support tables after the internal schema of struct is changed. change internal schema of struct schema(not support, will support in the next PR). 2. Unify the logic of iceberg/paimon/hudi native reader to handle schema change's table.
…ge table. (apache#48723) ### What problem does this PR solve? Problem Summary: Supports native reader reading tables after the top-level schema of paimon is changed, but does not support tables after the internal schema of struct is changed. change top-level schema(support): ```sql --spark sql ALTER TABLE table_name ADD COLUMNS (c1 INT,c2 STRING); ALTER TABLE table_name RENAME COLUMN c0 TO c1; ALTER TABLE table_name DROP COLUMNS (c1, c2); ALTER TABLE table_name ADD COLUMN c INT FIRST; ALTER TABLE table_name ADD COLUMN c INT AFTER b; ALTER TABLE table_name ALTER COLUMN col_a FIRST; ALTER TABLE table_name ALTER COLUMN col_a AFTER col_b; ``` change internal schema of struct schema(not support, will support in the next PR): ```sql --spark sql ALTER TABLE table_name ADD COLUMN v.value.f3 STRING; ALTER TABLE table_name RENAME COLUMN v.f1 to f100; ALTER TABLE table_name DROP COLUMN v.value.f3 ; ALTER TABLE table_name ALTER COLUMN v.col_a FIRST; ```
…ge table. (apache#48723) ### What problem does this PR solve? Problem Summary: Supports native reader reading tables after the top-level schema of paimon is changed, but does not support tables after the internal schema of struct is changed. change top-level schema(support): ```sql --spark sql ALTER TABLE table_name ADD COLUMNS (c1 INT,c2 STRING); ALTER TABLE table_name RENAME COLUMN c0 TO c1; ALTER TABLE table_name DROP COLUMNS (c1, c2); ALTER TABLE table_name ADD COLUMN c INT FIRST; ALTER TABLE table_name ADD COLUMN c INT AFTER b; ALTER TABLE table_name ALTER COLUMN col_a FIRST; ALTER TABLE table_name ALTER COLUMN col_a AFTER col_b; ``` change internal schema of struct schema(not support, will support in the next PR): ```sql --spark sql ALTER TABLE table_name ADD COLUMN v.value.f3 STRING; ALTER TABLE table_name RENAME COLUMN v.f1 to f100; ALTER TABLE table_name DROP COLUMN v.value.f3 ; ALTER TABLE table_name ALTER COLUMN v.col_a FIRST; ```
…ge table. (apache#48723) Problem Summary: Supports native reader reading tables after the top-level schema of paimon is changed, but does not support tables after the internal schema of struct is changed. change top-level schema(support): ```sql --spark sql ALTER TABLE table_name ADD COLUMNS (c1 INT,c2 STRING); ALTER TABLE table_name RENAME COLUMN c0 TO c1; ALTER TABLE table_name DROP COLUMNS (c1, c2); ALTER TABLE table_name ADD COLUMN c INT FIRST; ALTER TABLE table_name ADD COLUMN c INT AFTER b; ALTER TABLE table_name ALTER COLUMN col_a FIRST; ALTER TABLE table_name ALTER COLUMN col_a AFTER col_b; ``` change internal schema of struct schema(not support, will support in the next PR): ```sql --spark sql ALTER TABLE table_name ADD COLUMN v.value.f3 STRING; ALTER TABLE table_name RENAME COLUMN v.f1 to f100; ALTER TABLE table_name DROP COLUMN v.value.f3 ; ALTER TABLE table_name ALTER COLUMN v.col_a FIRST; ```
…ge table. (apache#48723) Problem Summary: Supports native reader reading tables after the top-level schema of paimon is changed, but does not support tables after the internal schema of struct is changed. change top-level schema(support): ```sql --spark sql ALTER TABLE table_name ADD COLUMNS (c1 INT,c2 STRING); ALTER TABLE table_name RENAME COLUMN c0 TO c1; ALTER TABLE table_name DROP COLUMNS (c1, c2); ALTER TABLE table_name ADD COLUMN c INT FIRST; ALTER TABLE table_name ADD COLUMN c INT AFTER b; ALTER TABLE table_name ALTER COLUMN col_a FIRST; ALTER TABLE table_name ALTER COLUMN col_a AFTER col_b; ``` change internal schema of struct schema(not support, will support in the next PR): ```sql --spark sql ALTER TABLE table_name ADD COLUMN v.value.f3 STRING; ALTER TABLE table_name RENAME COLUMN v.f1 to f100; ALTER TABLE table_name DROP COLUMN v.value.f3 ; ALTER TABLE table_name ALTER COLUMN v.col_a FIRST; ```
…able. (apache#49051) Similar to pr apache#48723 Problem Summary: 1. Supports native reader reading tables after the top-level schema of hudi is changed, but does not support tables after the internal schema of struct is changed. change internal schema of struct schema(not support, will support in the next PR). 2. Unify the logic of iceberg/paimon/hudi native reader to handle schema change's table.
What problem does this PR solve?
Problem Summary:
Supports native reader reading tables after the top-level schema of paimon is changed, but does not support tables after the internal schema of struct is changed.
change top-level schema(support):
change internal schema of struct schema(not support, will support in the next PR):
Release note
Supports native reader reading tables after the top-level schema of paimon is changed.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)