Skip to content

Panzer: Memory access error with HIP #13668

Open
@kgottiparthi

Description

We get the following error when we run our code on Frontier (OLCF). We are not sure where and how the memory access is failing and will be glad if you provide any suggestions to mitigate this.

CFL = 2.828e-08; dt = 1.000e-01; Time = 0.0000000000000e+00
| Nonlinear | F 2-Norm | # Linear | R 2-Norm |
0 3.19e-03
Memory access fault by GPU node-4 (Agent handle: 0xa77bbf0) on address 0xffff00000000. Reason: Unknown.
Aborted

rocgdb report:

#0 0x00007ff2e28d9124 in PHX::MDField<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> > const, panzer::Cell, panzer::Point, panzer::Dim>::operator()<int, int, int> (this=0x7ff2e28fbdb0 <kokkos_impl_hip_constant_memory_buffer+272>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>)
at libs/Trilinos-install-16/include/Phalanx_MDField.hpp:461
461 return m_view(indices...);

Thank you,
Kalyan

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions