-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
I was not able to find the culprit, but on my machine (24C/32T 13900K) switching from single-threaded to parallel FFT uses multiple CPU cores, but only slightly, resulting in ~3x time reduction, likely with a lot of wasted compute in the process.
It would have been nice to leverage CPU cores fully. There is already short-circuiting for parallelism when number of elements on each side is below 256:
https://github.com/sifraitech/rust-kzg/blob/5655cdd039788b1df4d628036fdc705080e000eb/blst-from-scratch/src/fft_fr.rs#L40
Tweaking it either way worsened time for me though.
I think there must be an opportunity for further performance or at very least efficiency improvements.
Metadata
Metadata
Assignees
Labels
No labels