Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of Expr struct #14366

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 29, 2025

Which issue does this PR close?

Rationale for this change

@waynexia 's comment on #14256 (comment) got me thinking maybe the build time regression had something to do with the size of Expr

So I poked around for ways to reduce the size, and I found that currently Expr is 272 bytes

What changes are included in this PR?

  1. Add a test for the size of Expr
  2. Change Expr::WindowFunction(WindowFunction) --> Expr::WindowFunction(Box<WindowFunction>) -- which drops the size of Exprfrom 272 to112` bytes

Are these changes tested?

functionally by CI

TODO:

  1. test impact on build time
  2. test impact on planning performance benchmarks

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 29, 2025
Comment on lines 2437 to 2439
return Some(idx + input_len);
} else {
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent return vs value

@alamb alamb force-pushed the alamb/make_expr_smaller branch from b25ba57 to f5df767 Compare January 29, 2025 21:42
@github-actions github-actions bot added sql SQL Planner optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate proto Related to proto crate labels Jan 29, 2025
@alamb alamb mentioned this pull request Jan 29, 2025
@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2025

I ran some build benchmarks on a GPC machine and I conclude this change does not improve the build timings

Building main....
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m28.125s
user    36m43.568s
sys     1m19.993s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m32.743s
user    36m47.328s
sys     1m19.786s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m33.336s
user    36m50.926s
sys     1m19.246s
+ git reset --hard
HEAD is now at cb5d42e135 Disable extended tests (#14604)
+ git checkout alamb/make_expr_smaller
Switched to branch 'alamb/make_expr_smaller'
Your branch is up to date with 'alamb/alamb/make_expr_smaller'.
+ echo 'Building make_expr_smaller....'
Building make_expr_smaller....
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m30.234s
user    37m15.027s
sys     1m19.688s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m30.580s
user    37m6.178s
sys     1m19.885s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m32.293s
user    36m54.063s
sys     1m19.557s
alamb@aal-dev:~/arrow-datafusion$

@crepererum
Copy link
Contributor

TBH I would be surprised if the build performance is affected by a few memcpy calls. I think the question here is rather if our runtime performance changes, potentially due to less stress on the memory allocator.

@alamb
Copy link
Contributor Author

alamb commented Feb 19, 2025

I'll try and find time to run some sql planning benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sql SQL Planner substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants