Description
There is an Issue on the OMPToolsInterface project running the totalview debugger with omp GPU targets. See the program tx_omp_target_generic.c.
See the screenshot initialize.png where openmp/libompd/src/omp-debug.cpp line 146 is failing.
This code used to look like ( and did not fail )
ret = TValue(process_handle->context, "ompd_CudaContextArray").
cast("ompd_cuda_context_ptr_t",1).
getArrayElement(i).
castBase(ompd_type_long_long).
getValue(cuda_ctx);
but now looks like
ret = TValue(process_handle->context, "ompd_CudaDeviceDataArray").
cast("DeviceDataTy",1).
getArrayElement(i).
access("Context").
castBase(ompd_type_long_long).
getValue(cuda_ctx);
and fails.
It is failing inside the getArrayElement() call which calls dereference().
The new code was pulled around Oct 11..
git clone https://github.com/OpenMPToolsInterface/llvm-project.git
cd llvm-project
git checkout ompd-tests
I attached some screen shots. I am debugging the program in screen shot “Pragma omp target.png” at line 15. The system has a single Cuda device.
The screen shots are of the total view debugger as it processes this line.
- Initialize.png
Here the TValue for ompd_cudaDeviceDataArray is being called. In the cascaded call sequence, getArrayElement(i) is failing . Note that we have one Cuda device on this system so i=0 in the loop of 1.
Line 153 detects the failure and never hits line 156. Note that the actual failure is occurring in the getValue(cuda_ctx) call, although the ultimate cause is because of earlier undetected failures in getArrayElement(). - GetArrayElement.png
Here deference() is failing but the return code is not being checked. Later on getSize on line 456 is called and returns success, so getArrayElement() ends up returning ompd_rc_ok. However, note that
the ret structure has symbolAddr fields set to zero, which is not correct. - Screen Shot Dereference.png
Here line 329 has set errorCode = ompd_rc_unsupported. This is because the call at 322 returned a ret.symbolAddr.address = 0. This was due to tmpAddr.address = 0 due to the results of the call at line 316. See ScreenShot Der2.png for the value of tmpAddress at line 322 . - ScreenShot Der2.png shows that this.symbolAddr.address was 0x…4280 which resulted in tmpAddress in ompd_bp_parallel_begin() needs to be called for all parallel constructs and not only for serialized regi #3 above getting a zero. Apparently the call at line 316 had a bad symbolAddr.
On a side note: TValue Constructor.png
I found this constructor odd since it did not record the typeName string in the object; in this case ompd_cudaDeviceDataArray. See screen shot. The typeName ended up being 0x0 after construction even though the
symbolAddr.segment and symbolAddr.address structure fields were set to the values for ompd_CudaDeviceDataArray.
The test program is
// Testing generic mode of nvptx devRtl
#include <stdio.h>
#pragma omp declare target
void test_breakpoint() {
asm("");
}
#pragma omp end declare target
void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
//init(v1, v2, N);
#pragma omp target map(v1, v2, p)
{
test_breakpoint();
#pragma omp parallel for
for (i=0; i<N; i++)
{
test_breakpoint();
p[i] = v1[i] * v2[i];
}
test_breakpoint();
}
//output(p, N);
}
int main() {
printf("calling vec_mul...\n");
vec_mult(64);
printf("done\n");
return 0;
}