Skip to content

TValue.getArrayElement().dereference() failing on latest OpenMPToolsInterface / ompd-tests branch #21

Open
@elliebinkley

Description

@elliebinkley

There is an Issue on the OMPToolsInterface project running the totalview debugger with omp GPU targets. See the program tx_omp_target_generic.c.
See the screenshot initialize.png where openmp/libompd/src/omp-debug.cpp line 146 is failing.
This code used to look like ( and did not fail )

ret = TValue(process_handle->context, "ompd_CudaContextArray").
      cast("ompd_cuda_context_ptr_t",1).
      getArrayElement(i).
      castBase(ompd_type_long_long).
      getValue(cuda_ctx);

but now looks like
ret = TValue(process_handle->context, "ompd_CudaDeviceDataArray").
cast("DeviceDataTy",1).
getArrayElement(i).
access("Context").
castBase(ompd_type_long_long).
getValue(cuda_ctx);
and fails.

It is failing inside the getArrayElement() call which calls dereference().

The new code was pulled around Oct 11..
git clone https://github.com/OpenMPToolsInterface/llvm-project.git
cd llvm-project
git checkout ompd-tests

I attached some screen shots. I am debugging the program in screen shot “Pragma omp target.png” at line 15. The system has a single Cuda device.
The screen shots are of the total view debugger as it processes this line.

  1. Initialize.png
    Here the TValue for ompd_cudaDeviceDataArray is being called. In the cascaded call sequence, getArrayElement(i) is failing . Note that we have one Cuda device on this system so i=0 in the loop of 1.
    Line 153 detects the failure and never hits line 156. Note that the actual failure is occurring in the getValue(cuda_ctx) call, although the ultimate cause is because of earlier undetected failures in getArrayElement().
  2. GetArrayElement.png
    Here deference() is failing but the return code is not being checked. Later on getSize on line 456 is called and returns success, so getArrayElement() ends up returning ompd_rc_ok. However, note that
    the ret structure has symbolAddr fields set to zero, which is not correct.
  3. Screen Shot Dereference.png
    Here line 329 has set errorCode = ompd_rc_unsupported. This is because the call at 322 returned a ret.symbolAddr.address = 0. This was due to tmpAddr.address = 0 due to the results of the call at line 316. See ScreenShot Der2.png for the value of tmpAddress at line 322 .
  4. ScreenShot Der2.png shows that this.symbolAddr.address was 0x…4280 which resulted in tmpAddress in ompd_bp_parallel_begin() needs to be called for all parallel constructs and not only for serialized regi #3 above getting a zero. Apparently the call at line 316 had a bad symbolAddr.

On a side note: TValue Constructor.png
I found this constructor odd since it did not record the typeName string in the object; in this case ompd_cudaDeviceDataArray. See screen shot. The typeName ended up being 0x0 after construction even though the
symbolAddr.segment and symbolAddr.address structure fields were set to the values for ompd_CudaDeviceDataArray.

Larry Burley
initialize
GetArrayElement
Pragma omp target issue
Screen Shot Dereference()

The test program is
// Testing generic mode of nvptx devRtl
#include <stdio.h>

#pragma omp declare target
void test_breakpoint() {
asm("");
}
#pragma omp end declare target

void vec_mult(int N)
{
int i;
float p[N], v1[N], v2[N];
//init(v1, v2, N);
#pragma omp target map(v1, v2, p)
{
test_breakpoint();
#pragma omp parallel for
for (i=0; i<N; i++)
{
test_breakpoint();
p[i] = v1[i] * v2[i];
}
test_breakpoint();
}
//output(p, N);
}
int main() {
printf("calling vec_mul...\n");
vec_mult(64);
printf("done\n");
return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions