Description
Reproducer:
program main
use omp_lib
integer, parameter :: n = 1000000
integer(8) :: t1, t2, rate
real :: a(n), b(n)
real(8) :: start, end
b = 1.0
call system_clock(t1)
start = omp_get_wtime()
call work(a, b, n)
call system_clock(t2, rate)
end = omp_get_wtime()
print *, 'system_clock: ', dble(t2-t1)/dble(rate)
print *, 'omp_get_wtime: ', end-start
contains
subroutine work(a, b, n)
real :: a(:), b(:)
!$omp parallel do
do i=1,n
a(i) = sin(b(i))
end do
!$omp end parallel do
end subroutine work
end program main
Compile with: flang-new -fopenmp clock.f90 -O0
Run with OMP_NUM_THREADS=1 ./a.out
and OMP_NUM_THREADS=128 ./a.out
In the multithreaded case, the program prints the wall time multiplied by the number of threads. I guess this might be an acceptable behavior, but many other compilers return the wall time.
I think this might be related to our preference of CLOCK_PROCESS_CPUTIME_ID
over other timers:
Note that gcc, for example, tries to use CLOCK_MONOTONIC
and then CLOCK_REALTIME
: https://github.com/gcc-mirror/gcc/blob/9693459e030977d6e906ea7eb587ed09ee4fddbd/libgfortran/intrinsics/system_clock.c#L39
@rovka , @Leporacanthicus, you've made changes in this code - would you agree that we should change the order of the ifdefs to match other compilers?