Skip to content

Rewrite inlining pass #1935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Rewrite inlining pass #1935

wants to merge 7 commits into from

Conversation

vouillon
Copy link
Member

No description provided.

@vouillon vouillon force-pushed the inlining branch 3 times, most recently from 840420d to 7b64a79 Compare April 14, 2025 23:08
@vouillon vouillon force-pushed the inlining branch 3 times, most recently from 79446f9 to ba1a622 Compare April 16, 2025 15:48
@vouillon vouillon force-pushed the inlining branch 4 times, most recently from b62b39e to 5cb6652 Compare April 24, 2025 17:49
@vouillon vouillon marked this pull request as ready for review April 24, 2025 17:50
@hhugo
Copy link
Member

hhugo commented Apr 25, 2025

I've pushed a fixup to the testsuite.
We should check how this PR affects functor heavy programs (something using core maybe).
@TyOverby could you tests this PR on your side ?

@hhugo
Copy link
Member

hhugo commented Apr 25, 2025

We need a changelog entry

@hhugo
Copy link
Member

hhugo commented Apr 25, 2025

I'm not certain I read the benchmark correctly.
It seems that partial render table sees a code size increase of 10%, memory increase of ~50%, compilation time increase of 30% for not runtime improvement.

@hhugo
Copy link
Member

hhugo commented May 6, 2025

Maybe we can wait for #1962 to get better measurements.

@hhugo
Copy link
Member

hhugo commented May 7, 2025

We don't have the latest benchmark.
The last result we have shows a runtime regression for ocamlc (maybe some noise ?).

With this PR, we seem to double the time spent in inline. We can probably live with that.

@hhugo hhugo force-pushed the inlining branch 2 times, most recently from 10a1ba8 to 6aaf9ad Compare May 7, 2025 21:59
@hhugo
Copy link
Member

hhugo commented May 9, 2025

There is a large increase for maxresident bewteen node 23 and node 24
node 23

$ /bin/time node _build/default/compiler/tests-ocaml/match-exception/streams.bc.js
iter_stream with handler case (match) is tail recursive
3.79user 0.52system 0:04.00elapsed 107%CPU (0avgtext+0avgdata 2077648maxresident)k
0inputs+0outputs (0major+510186minor)pagefaults 0swaps

node 24

$ /bin/time node _build/default/compiler/tests-ocaml/match-exception/streams.bc.js
iter_stream with handler case (match) is tail recursive
3.77user 0.92system 0:02.89elapsed 162%CPU (0avgtext+0avgdata 3259072maxresident)k
0inputs+0outputs (0major+825840minor)pagefaults 0swaps

@TyOverby
Copy link
Collaborator

TyOverby commented May 9, 2025

Apologies for the delay; I didn't see this thread for a while. We should have some test and benchmark results ready for you next week.

@TyOverby
Copy link
Collaborator

TyOverby commented May 9, 2025

out of curiosity, what was the osx / node-24 issue? consuming too much memory?

@hhugo
Copy link
Member

hhugo commented May 9, 2025

Apologies for the delay; I didn't see this thread for a while. We should have some test and benchmark results ready for you next week.

Many improvements landed on master in the past few days. It would be nice to test both base and tip of the PR so we can understand the impact of the PR alone

@hhugo hhugo force-pushed the inlining branch 3 times, most recently from 1e96ca7 to 29f7ef3 Compare May 14, 2025 11:29
@vouillon
Copy link
Member Author

out of curiosity, what was the osx / node-24 issue? consuming too much memory?

@TyOverby See my comment above.

@vouillon
Copy link
Member Author

I'm not certain I read the benchmark correctly. It seems that partial render table sees a code size increase of 10%, memory increase of ~50%, compilation time increase of 30% for not runtime improvement.

Right, the aggressive inlining of functors does not really seem to result into any runtime improvement with js_of_ocaml. So it is not enabled only with wasm_of_ocaml.

@hhugo
Copy link
Member

hhugo commented May 16, 2025

I've pushed commits to only inline (small) functors in o3 with jsoo. Let's wait for the benchmarks

@hhugo hhugo force-pushed the inlining branch 2 times, most recently from 13f7293 to 4ac6713 Compare May 16, 2025 10:58
@hhugo
Copy link
Member

hhugo commented May 16, 2025

fannkuch_redux and fft seem to take longer now. Can you take a look ? Compilation time increase everywhere but I guess we could live with that given recent improvement everywhere else

@vouillon
Copy link
Member Author

For fft, it's because a function no longer gets inline because I have reduced the inlining limit from 200 down to 150.
For fankuch_redux, the function fannkuch is no longer inlined at toplevel, so it not optimized with the assumption that n = 10.

  let n = 10 in
  let _maxflips, _checksum = fannkuch n in

Inlining small functions make a significant difference for raytrace.

@hhugo
Copy link
Member

hhugo commented May 16, 2025

For fft, it's because a function no longer gets inline because I have reduced the inlining limit from 200 down to 150. For fankuch_redux, the function fannkuch is no longer inlined at toplevel, so it not optimized with the assumption that n = 10.

  let n = 10 in
  let _maxflips, _checksum = fannkuch n in

Inlining small functions make a significant difference for raytrace.

Are you ok to merge in the current state ?

@hhugo
Copy link
Member

hhugo commented May 16, 2025

Apologies for the delay; I didn't see this thread for a while. We should have some test and benchmark results ready for you next week.

@TyOverby, any update on this ?

@TyOverby
Copy link
Collaborator

We've been trying to import these changes (well, really the base revision so that we have a good point to compare benchmarks with) and have hit a very large number of conflicts with our internal patches due to the recent PRs that have been merged. I think we're close to being ready to test this PR, my guess is next week.

@vouillon
Copy link
Member Author

Are you ok to merge in the current state ?

I would prefer to wait for some feedbacks from Ty.

vouillon and others added 7 commits May 17, 2025 23:19
- We are a lot more aggressive at inlining functor-like functions in
wasm_of_ocaml, since this may enable further optimizations
- We are more cautious at inlining nested functions, since this can
result in memory leaks
- We inline a larger class of small functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants