Description
We get the following error when we run our code on Frontier (OLCF). We are not sure where and how the memory access is failing and will be glad if you provide any suggestions to mitigate this.
CFL = 2.828e-08; dt = 1.000e-01; Time = 0.0000000000000e+00
| Nonlinear | F 2-Norm | # Linear | R 2-Norm |
0 3.19e-03
Memory access fault by GPU node-4 (Agent handle: 0xa77bbf0) on address 0xffff00000000. Reason: Unknown.
Aborted
rocgdb report:
#0 0x00007ff2e28d9124 in PHX::MDField<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> > const, panzer::Cell, panzer::Point, panzer::Dim>::operator()<int, int, int> (this=0x7ff2e28fbdb0 <kokkos_impl_hip_constant_memory_buffer+272>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>)
at libs/Trilinos-install-16/include/Phalanx_MDField.hpp:461
461 return m_view(indices...);
Thank you,
Kalyan