A while ago, I changed the timers in Nalu to use wall time, see:
spdomin/Nalu#41
However, when running a KNL/threaded/SIMD heat conduction case, I see a discrepancy in the STKPERF and STK::diag timers.
Moreover, there is unaccounted time spent somewhere in the simulation.
We need to resolve this for accurate timing benchmarking.