-
Notifications
You must be signed in to change notification settings - Fork 769
feat(query): support create iceberg table with partition and properties #17812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eaa5937
to
e40546e
Compare
Rerun ci, wait #17813 merge. |
e142d45
to
1c22e47
Compare
ba0cf39
to
dd6cfe0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 47 files at r1, 20 of 49 files at r3.
Reviewable status: 22 of 57 files reviewed, 4 unresolved discussions (waiting on @BohuTANG, @sundy-li, @TCeason, and @Xuanwo)
src/meta/app/src/schema/table.rs
line 257 at r3 (raw file):
#[derive(serde::Serialize, serde::Deserialize, Clone, Debug, Eq, PartialEq)] pub enum TablePartition { Identity { columns: Vec<String> },
Is it safe to use column name as partition keys, while user might rename a column.
A good issue. I think we should forbidden iceberg table rename col name and return a definite error |
Hi, I’m a bit concerned that we’re planning to store this data in metasrv. Are we intending to use metasrv as an iceberg catalog, or will we support creating iceberg tables as internal tables, similar to what the Fuse engine does? |
Actually we can add the partition info in req not tablemeta. If we do not store this information, we must load table when we use it later |
It depends on our design and our objectives. Typically, we can't assume that we are the only ones writing to or reading from an iceberg table. Users can perform various actions on it, such as renaming column names with Spark. While we can cache this metadata, we can't persist it. |
If so it's better to add these info in req. maybe we need to cost some time and discuss it tomorrow |
ab84fe5
to
a4dcf71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 47 files at r1, 1 of 38 files at r6, all commit messages.
Reviewable status: 2 of 66 files reviewed, 4 unresolved discussions (waiting on @BohuTANG, @sundy-li, @TCeason, and @Xuanwo)
src/meta/app/src/schema/table.rs
line 257 at r3 (raw file):
Previously, TCeason wrote…
A good issue.
I think we should forbidden iceberg table rename col name and return a definite error
. Like view and stream engine
Then how to deal with the column rename? 🤔
For write : We set the partition in Req not store these info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 47 files at r1, 1 of 25 files at r2, 23 of 52 files at r5, 37 of 38 files at r6.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @BohuTANG, @sundy-li, @TCeason, and @Xuanwo)
src/query/storages/iceberg/src/database.rs
line 201 at r6 (raw file):
}; let table_create_option = if let Some(ref partition) = req.table_partition {
The deep nesting since this line makes the logic hard to understand. It would be better extract these part into another method and flatten the nesting logic.
BTW, the builder
should be created only once.
src/query/service/src/interpreters/interpreter_table_create.rs
line 497 at r6 (raw file):
columns: self.plan.table_partition.clone(), }) },
Use uniformed type for property and partition.
All types should be defined with Option
, or none types with Option
.
Code quote:
table_properties: if self.plan.table_properties.is_empty() {
None
} else {
Some(self.plan.table_properties.clone())
},
table_partition: if self.plan.table_partition.is_empty() {
None
} else {
Some(TablePartition::Identity {
columns: self.plan.table_partition.clone(),
})
},
src/query/sql/src/planner/binder/ddl/table.rs
line 489 at r6 (raw file):
table_partition.push(iceberg_table_partition.to_string()); } }
iceberg_table_partition.into_iter().collect()
would be good enough to convert a BTreeMap to Vec?
Code quote:
let mut table_partition: Vec<String> = Vec::new();
if let Some(iceberg_table_partition) = iceberg_table_partition {
for iceberg_table_partition in iceberg_table_partition {
table_partition.push(iceberg_table_partition.to_string());
}
}
src/meta/app/src/schema/table.rs
line 441 at r6 (raw file):
TablePartition::Identity { columns: vec![] } } }
Why not derive Default for it?
Code quote:
impl Default for TablePartition {
fn default() -> Self {
TablePartition::Identity { columns: vec![] }
}
}
src/query/sql/src/planner/plans/ddl/table.rs
line 56 at r6 (raw file):
pub options: TableOptions, pub table_properties: TableOptions, pub table_partition: Vec<String>,
If partitioning is optional, these two field should be optional.
If the default value, such as vec![]
indicating no-partitioning, everywhere else should use vec![]
to represent a no-partitioning decision.
To me, Option<Vec<String>>
would be more accurate about the purpose.
Code quote:
pub table_properties: TableOptions,
pub table_partition: Vec<String>,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @drmingdrmer from a discussion.
Reviewable status: 50 of 66 files reviewed, 7 unresolved discussions (waiting on @BohuTANG, @drmingdrmer, @sundy-li, and @Xuanwo)
src/meta/app/src/schema/table.rs
line 441 at r6 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Why not derive Default for it?
Maybe support other type in the future. So I impl a Default.
src/query/storages/iceberg/src/database.rs
line 201 at r6 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
The deep nesting since this line makes the logic hard to understand. It would be better extract these part into another method and flatten the nesting logic.
BTW, the
builder
should be created only once.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 50 of 66 files reviewed, 7 unresolved discussions (waiting on @BohuTANG, @drmingdrmer, @sundy-li, and @Xuanwo)
src/query/sql/src/planner/binder/ddl/table.rs
line 489 at r6 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
iceberg_table_partition.into_iter().collect()
would be good enough to convert a BTreeMap to Vec?
Done.
add45f4
to
4534df3
Compare
…ERTIES it will be failed
2. iceberg_table_properties rename to table_properties 3. iceberg_partition rename to table_partition
All conversations were resolved, ci passed. Please review this pr. cc @Xuanwo @drmingdrmer @sundy-li |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Note:
Add new struct TablePartition. Only support identity transform.
Tests
Type of change
This change is