Skip to content

Commit bf1989b

Browse files
authored
Improve performance of par_ilut and improve benchmark (kokkos#2846)
* Improve performance of par_ilut and improve benchmark Main change: threshold_select was doing a small piece of par_ilut on host. I assumed based on the code and ginkgo's OMP implementation that the computational cost of this piece would be trivial, but it was actually very expensive. Instead of relying on std::nth_element, just make a copy of values and sort based on the absolute value. This requires a bit more device memory, but is 20 times faster, so I think this is well worth it. Secondary change: various improvements to the par_ilut benchmark. The main improvements are the ability to validate the ILU via gmres+luprec and efforts to reduce memory usage so you can benchmark larger matrices. Signed-off-by: James Foucar <[email protected]> * Fix complex Signed-off-by: James Foucar <[email protected]> * Remove resize_no_preserve, realloc works fine Signed-off-by: James Foucar <[email protected]> --------- Signed-off-by: James Foucar <[email protected]>
1 parent d2e1368 commit bf1989b

File tree

4 files changed

+346
-131
lines changed

4 files changed

+346
-131
lines changed

0 commit comments

Comments
 (0)