I created a benchmark but DuckDB run times are super slow and not sure why

Here's a link to the repo: https://github.com/AdrianAntico/Benchmarks

I use 3 datasets, one with a million rows, one with 10M, and one with 100M. I am currently just running a sum aggregation, with varying number of grouping variables and aggregated numeric variables. One main difference between those datasets and the ones used here is that I make use of a Date type column in all aggregations.  It also seems that Polars has a harder time with that data type. I'm showing data.table to be the fastest for all queries except the single Date type aggregation (this is where DuckDB wins).

I copied some of the code from this repo. Hoping someone can take a look because the results were a bit unexpected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I created a benchmark but DuckDB run times are super slow and not sure why #74

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I created a benchmark but DuckDB run times are super slow and not sure why #74

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions