Open
Description
Here's a link to the repo: https://github.com/AdrianAntico/Benchmarks
I use 3 datasets, one with a million rows, one with 10M, and one with 100M. I am currently just running a sum aggregation, with varying number of grouping variables and aggregated numeric variables. One main difference between those datasets and the ones used here is that I make use of a Date type column in all aggregations. It also seems that Polars has a harder time with that data type. I'm showing data.table to be the fastest for all queries except the single Date type aggregation (this is where DuckDB wins).
I copied some of the code from this repo. Hoping someone can take a look because the results were a bit unexpected.
Metadata
Metadata
Assignees
Labels
No labels