Skip to content

Implement OpenMP-based parallelism for C lib #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ibmibmibm
Copy link

@ibmibmibm ibmibmibm commented Mar 19, 2025

Mimics all TBB changes (#445).

Related issues: #229 #184

@ibmibmibm ibmibmibm force-pushed the master branch 3 times, most recently from ae0e29e to 540bd03 Compare March 19, 2025 18:44
@oconnor663
Copy link
Member

I don't love the prospect of maintaining two separate implementations of this, but I'm curious to hear your thoughts about how many callers might prefer/demand to use OpenMP instead of TBB.

Separately, there was some discussion in the TBB PR about how OpenMP might not perform well with the nested/dynamic task spawning strategy that we're using here. @sylvanshade commented that:

...OpenMP is generally not suitable for nested parallelism, which is exactly how the problem is being solved here with Rayon and the oneTBB implementation. Most of the performance recommendations I see regarding OpenMP suggest to avoid nested parallelism at all costs because it's known to be inefficient.

@sneves made some performance measurements that seemed to corroborate @sylvanshade's comment. And when I run the benchmarks you added to blake3_c_rust_bindings (thank you for being so thorough!), it looks like OpenMP is having some trouble:

$ cd c/blake3_c_rust_bindings && cargo +nightly bench --features=tbb,openmp 1024_kib
...
test bench_incremental_1024_kib        ... bench:     117,774.41 ns/iter (+/- 1,743.86) = 8903 MB/s
test bench_openmp_1024_kib             ... bench:     946,225.39 ns/iter (+/- 507,769.54) = 1108 MB/s
test bench_tbb_1024_kib                ... bench:      47,784.35 ns/iter (+/- 6,406.28) = 21944 MB/s

Have you seen anything different in your measurements? Are there other cases we should be testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants