-
Notifications
You must be signed in to change notification settings - Fork 612
Add support for TABLESAMPLE
pipe operator
#1860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/ast/query.rs
Outdated
@@ -2680,28 +2680,32 @@ pub enum PipeOperator { | |||
full_table_exprs: Vec<ExprWithAliasAndOrderBy>, | |||
group_by_expr: Vec<ExprWithAliasAndOrderBy>, | |||
}, | |||
/// Selects a random sample of rows from the input table. | |||
/// Syntax: `|> TABLESAMPLE <method> (<size> {ROWS | PERCENT})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the officially supported spec from the paper but doesn't cover everything that is technically supported in this PR. I'm wondering how to best deal with that (see the open question in the PR description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the doc comment doesn't necessarily need to spell out the full spec. usually its enough to give a rough example of what the statement looks like e.g.
Syntax: `|> TABLESAMPLE BERNOULLI(50)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hendrikmakait! The changes look reasonable to me overall, left a couple minor comments
src/ast/query.rs
Outdated
} | ||
|
||
impl fmt::Display for PipeOperator { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
match self { | ||
PipeOperator::Select { exprs } => { | ||
write!(f, "SELECT {}", display_comma_separated(exprs.as_slice())) | ||
write!(f, " SELECT {}", display_comma_separated(exprs.as_slice())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think we'd ideally undo these changes, its better for the operator display to be standalone where possible (i.e. they shouldn't assume surrounding space formatting, rather it should be left up to the caller), it makes it easier to compose nodes in the AST when displaying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I found a way to fix the formatting for TableSample
in a different way that requires fewer changes and follows this general principle.
src/ast/query.rs
Outdated
@@ -2680,28 +2680,32 @@ pub enum PipeOperator { | |||
full_table_exprs: Vec<ExprWithAliasAndOrderBy>, | |||
group_by_expr: Vec<ExprWithAliasAndOrderBy>, | |||
}, | |||
/// Selects a random sample of rows from the input table. | |||
/// Syntax: `|> TABLESAMPLE <method> (<size> {ROWS | PERCENT})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the doc comment doesn't necessarily need to spell out the full spec. usually its enough to give a rough example of what the statement looks like e.g.
Syntax: `|> TABLESAMPLE BERNOULLI(50)
@@ -1559,7 +1559,7 @@ impl fmt::Display for TableSampleBucket { | |||
} | |||
impl fmt::Display for TableSample { | |||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | |||
write!(f, " {}", self.modifier)?; | |||
write!(f, "{}", self.modifier)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better composition of AST building blocks, I remove the whitespace here and introduce it again for the structs that contain a TableSample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @hendrikmakait!
cc @alamb
@hendrikmakait could you take a look at the CI failures? |
@iffyio: CI should be fixed now |
I restarted the checks |
🚀 |
Part of #1758
This PR adds support for the
|> TABLESAMPLE ...
pipe operator.Open question:
BigQuery's
TABLESAMPLE
operator does not support the full SQL standard spec. This PR adds support for a superset of the BigQuery syntax which, e.g., includes theREPEATABLE
clause and reuses the existing code for parsingTABLESAMPLE
operators. Is this desired from the perspective of the project or is it preferred to stick closely to the BigQuery syntax? If so, I'm happy to change this.