-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vector-of-structs of preds/semi for Lengauer-Tarjan #408
Use vector-of-structs of preds/semi for Lengauer-Tarjan #408
Conversation
I use the following benchmark: dominator_tree_benchmark.cpp On my machine (32 X 1792.7 MHz CPU s with hyper-threading and almost zero Load Average, Ubuntu 20.4) the report is the following (we may use the state after merging #407 as a base-line:
After implementing a "vector-of-structs" solution, the numbers are the following:
Here we can see about 1% speedup for the "large" cases (for CFGs with 186 basic blocks) and about 10% for small ones (Muchnick. fig. 8.18, 8 vertices). I'm thinking what to deal with the |
Maybe a check on a larger graph (up to 1000 or 2000-3000) nodes is needed to ensure there is no regression for large inputs. |
Thanks for trying this change, pity it didn't yield anything significant. I still think it's a better logical design, so I'm happy to proceed with it, although I'd like to make a few style changes. |
@jeremy-murphy Thank you for the suggestion, I've replaced every Also, I added a benchmark for a huge (3000+ nodes) graph, on such graph I see the following situation. The baseline (code from the
With the "cache-friendly" solution:
So, we can see even some performance degradation, up to 3-6%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience!
I'd love some changes about naming, etc and one change that might improve performance.
Thank you!
ac49c66
to
d7d4f42
Compare
I've compared the baseline (the current The command: $ perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./dominator_tree_benchmark_20_O3_baseline --benchmark_filter="Huge Inlined Function \(vertex vector\)" Baseline:
The final variant:
The final variant leads to about 1 percentage point more cache misses, from another point of view and I have no idea why, it leads to about 3% fewer branches. I have no answer yet why the final variant with using a vector of structs leads to even a little bit but to more cache misses on our workload (the implementation of the algorithm). My hypothesis is some irregularity in the algorithm itself: when we do not scan every vertex with its triple one by one but jumps from one to another over the half of the array but this is my attempt to just guess the answer only and not a result of any investigation. But the question is interesting on its own, as we can see, not every workload can be made better from the performance point of view just by using the vector of structs or struct of vectors. Anyway I would like thank you for the initial hypothesis to use the vector of structs: it gives me a good task to play with. Also, the average of the subsequent 3 runs for the benchmark: 239734 ns (the final version) vs 236870 ns (baseline), baseline is slightly better, in 1.2% (interesting that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the code change even with the apparent cost to performance. I'm curious whether the same performance change happens across compilers and CPUs. If you have time, please try GCC as well.
I'll wait for you adjacency_matrix changes or a comment from you that you're done before I finalize the review.
This specialization of the adjacency_list class uses different value for `graph_traits< G >::null_vertex()`.
d7d4f42
to
57058f0
Compare
I believe I'm done from my side (when the CI will find no errors). @jeremy-murphy could you have a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just requested a small change in variable name and then it's good to merge.
I've renamed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for making this improvement to your previous change, I think it's a good improvement in general even if it's not perfect.
Closes #383