Skip to content

Add support for TABLESAMPLE pipe operator #1860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 30, 2025

Conversation

hendrikmakait
Copy link
Contributor

Part of #1758

This PR adds support for the |> TABLESAMPLE ... pipe operator.

Open question:

BigQuery's TABLESAMPLE operator does not support the full SQL standard spec. This PR adds support for a superset of the BigQuery syntax which, e.g., includes the REPEATABLE clause and reuses the existing code for parsing TABLESAMPLE operators. Is this desired from the perspective of the project or is it preferred to stick closely to the BigQuery syntax? If so, I'm happy to change this.

src/ast/query.rs Outdated
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
full_table_exprs: Vec<ExprWithAliasAndOrderBy>,
group_by_expr: Vec<ExprWithAliasAndOrderBy>,
},
/// Selects a random sample of rows from the input table.
/// Syntax: `|> TABLESAMPLE <method> (<size> {ROWS | PERCENT})`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the officially supported spec from the paper but doesn't cover everything that is technically supported in this PR. I'm wondering how to best deal with that (see the open question in the PR description).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the doc comment doesn't necessarily need to spell out the full spec. usually its enough to give a rough example of what the statement looks like e.g.

Syntax: `|> TABLESAMPLE BERNOULLI(50)

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hendrikmakait! The changes look reasonable to me overall, left a couple minor comments

src/ast/query.rs Outdated
}

impl fmt::Display for PipeOperator {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
PipeOperator::Select { exprs } => {
write!(f, "SELECT {}", display_comma_separated(exprs.as_slice()))
write!(f, " SELECT {}", display_comma_separated(exprs.as_slice()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think we'd ideally undo these changes, its better for the operator display to be standalone where possible (i.e. they shouldn't assume surrounding space formatting, rather it should be left up to the caller), it makes it easier to compose nodes in the AST when displaying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I found a way to fix the formatting for TableSample in a different way that requires fewer changes and follows this general principle.

src/ast/query.rs Outdated
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
full_table_exprs: Vec<ExprWithAliasAndOrderBy>,
group_by_expr: Vec<ExprWithAliasAndOrderBy>,
},
/// Selects a random sample of rows from the input table.
/// Syntax: `|> TABLESAMPLE <method> (<size> {ROWS | PERCENT})`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the doc comment doesn't necessarily need to spell out the full spec. usually its enough to give a rough example of what the statement looks like e.g.

Syntax: `|> TABLESAMPLE BERNOULLI(50)

@@ -1559,7 +1559,7 @@ impl fmt::Display for TableSampleBucket {
}
impl fmt::Display for TableSample {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, " {}", self.modifier)?;
write!(f, "{}", self.modifier)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better composition of AST building blocks, I remove the whitespace here and introduce it again for the structs that contain a TableSample.

@hendrikmakait hendrikmakait requested a review from iffyio May 28, 2025 08:31
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @hendrikmakait!
cc @alamb

@iffyio
Copy link
Contributor

iffyio commented May 29, 2025

@hendrikmakait could you take a look at the CI failures?

@hendrikmakait
Copy link
Contributor Author

@iffyio: CI should be fixed now

@alamb
Copy link
Contributor

alamb commented May 29, 2025

I restarted the checks

@iffyio iffyio merged commit a8bde39 into apache:main May 30, 2025
9 checks passed
@hendrikmakait hendrikmakait deleted the pipe-operators branch May 30, 2025 07:33
@alamb
Copy link
Contributor

alamb commented Jun 2, 2025

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants