-
Couldn't load subscription status.
- Fork 537
feat(bench): add new benchmarking script, harness, and profiling guide #3840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bench): add new benchmarking script, harness, and profiling guide #3840
Conversation
|
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
|
Example profile (not very useful due to the nature of tokio/async, but wonder if there's a way around that): https://share.firefox.dev/4hcALa8 Benchmark results: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3840 +/- ##
==========================================
- Coverage 73.99% 73.98% -0.02%
==========================================
Files 148 148
Lines 38904 38882 -22
Branches 38904 38882 -22
==========================================
- Hits 28788 28767 -21
Misses 8850 8850
+ Partials 1266 1265 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I was unaware that duckdb could also generate tcp-ds data!
Just wondering if we still need benchmarks to be in a separate crate? I might look into this in a follow up.
|
It seems we got rid of some dependency requirements... |
Me neither! Super fast on my M4 Pro. And I don't think so, I left it just to be consistent. It is nice to have the benchmark script as a separate binary, though |
|
don't approve yet, want to improve the harness to make sure we're only benchmarking the merge (and not the setup) |
|
Nice, much better |
|
|
||
| You can generate the TPC-DS dataset yourself by downloading and compiling [the generator](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) | ||
| You may need to update the CFLAGS to include `-fcommon` to compile on newer versions of GCC. | ||
| This will generate a folder called `tpcds_parquet` containing many parquet files. Place it at `crates/benchmarks/data/tpcds_parquet` (or set `TPCDS_PARQUET_DIR`). Credits to [Xuanwo's Blog](https://xuanwo.io/links/2025/02/duckdb-is-the-best-tpc-data-generator/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://datafusion.apache.org/blog/2025/04/10/fastest-tpch-generator/
This one is faster than duckdb extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Would only suggest using https://github.com/clflushopt/tpchgen-rs for faster generation
That's TPC-H, not TPC-DS. Looks like TPC-DS is still in progress clflushopt/tpchgen-rs#51 |
Ah didn't see that :) |
|
No worries! When the TPC-DS support gets upstreamed, we'll presumably get the ability to generate the data on-the-fly |
Signed-off-by: Abhi Agarwal <[email protected]>
Signed-off-by: Abhi Agarwal <[email protected]>
…setup Signed-off-by: Abhi Agarwal <[email protected]>
4281a86 to
74f4110
Compare
Description
This redoes the merge-based benchmark in crates/benchmark, replacing it with
divanas a real harness combined with adding a script that can be used for profiling.Related Issue(s)
Closes #3839
Documentation
Documentation is included in the updated README