You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve performance of par_ilut and improve benchmark (kokkos#2846)
* Improve performance of par_ilut and improve benchmark
Main change: threshold_select was doing a small piece of
par_ilut on host. I assumed based on the code and ginkgo's OMP
implementation that the computational cost of this piece would be
trivial, but it was actually very expensive. Instead of relying
on std::nth_element, just make a copy of values and sort based on
the absolute value. This requires a bit more device memory, but is
20 times faster, so I think this is well worth it.
Secondary change: various improvements to the par_ilut benchmark. The
main improvements are the ability to validate the ILU via gmres+luprec
and efforts to reduce memory usage so you can benchmark larger matrices.
Signed-off-by: James Foucar <[email protected]>
* Fix complex
Signed-off-by: James Foucar <[email protected]>
* Remove resize_no_preserve, realloc works fine
Signed-off-by: James Foucar <[email protected]>
---------
Signed-off-by: James Foucar <[email protected]>
0 commit comments