Open
Description
Hello,
I was looking through the commits and noticed two commits: 1c452a9 and 429a82a
The first one changed some code, and I'd like to ask what changes were made here, and more importantly how exactly that improved performance? What was the big bottleneck here?
I understand the code (at least somewhat), but the diffs are a bit mangled and hard to follow.
The second one surprises me as well, how were those aggressive inlining attributes identified as harmful to the performance? Did you manually comment them out and re-run the benchmarks? (I doubt it a bit, that'd be a somewhat "blind" approach, no??) Did you inspect the generated asm code and identified some issues there?
Great work on the library! I love it! :)