feat: implement a core tpch table func to generate all data#1
Merged
Conversation
kevinjqliu
reviewed
May 26, 2025
Contributor
kevinjqliu
left a comment
There was a problem hiding this comment.
SELECT * FROM tpch(scale_factor, write_to_disk, path)
I think write_to_disk here can be derived from path. As a user, I would only specify path if i want to write to disk.
Another option for the "write the disk" feature might be to use the COPY command
This allows us to specify the path (location) and other write options, such as the parquet options. And aligns with the duckdb solution described here
Sidenote, I would love the ability to write multiple parquet files based on size (i.e. 512MB). Duckdb has the FILE_SIZE_BYTES option in the COPY command,. But i could not find a similar option in datafusion
Member
Author
|
As of now I am happy with the feature set, I'll wait for another pair of eyes before merging and tagging a new release thanks @kevinjqliu |
alamb
reviewed
Jun 29, 2025
| let sql_df = ctx.sql(&format!("SHOW TABLES;")).await?; | ||
| sql_df.show().await?; | ||
|
|
||
| let sql_df = ctx.sql(&format!("SELECT * FROM nation LIMIT 5;")).await?; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change introduces a new table function
SELECT * FROM tpch(scale_factor, write_to_disk, path)that generates all the individual tables in one go allows us to register a single UDTF instead of multiple ones.