When running the GMRES solver with the HIP backend on AMD Instinct MI250X hardware (ROCm 6.4.3),
an assertion is triggered at every iteration of the solver:
accessor/accessor_helper.hpp:57: compute: Assertion `first < static_cast<IndexType>(size[dim_idx])' failed
The assertion fires exactly 4 times per iteration, starting from iteration 0, in a perfectly
systematic way. Despite the assertions, the solver converges correctly.
Environment
Ginkgo version v1.10.0
ROCm version 6.4.3-128
GPU AMD Instinct MI250X (gfx90a)
OS Linux
Build HIP backend, no MPI, no CUDA
Minimal reproducer
auto exec = gko::HipExecutor::create(0, gko::ReferenceExecutor::create());
auto A = share(gko::matrix::Csr<double>::create(exec, gko::dim<2>{N, N}));
auto rhs = gko::matrix::Dense<double>::create(exec, gko::dim<2>(N, 1));
auto u = gko::matrix::Dense<double>::create(exec, gko::dim<2>(N, 1));
auto solver = gko::solver::Gmres<double>::build()
.with_criteria(
gko::stop::Iteration::build().with_max_iters(2000),
gko::stop::ResidualNorm<double>::build()
.with_baseline(gko::stop::mode::rhs_norm)
.with_reduction_factor(1e-6))
.with_preconditioner(gko::preconditioner::Jacobi<double>::build())
.on(exec)->generate(A);
solver->apply(rhs, u);
With N = 99792 (99792×99792 sparse CSR matrix, ~7 non-zeros per row).
Actual behavior
4 assertions per iteration, every iteration, from iteration 0:
[LOG] >>> apply started on A LinOp[gko::solver::Gmres<double>,...] ...
accessor/accessor_helper.hpp:57: compute: Assertion `first < static_cast<IndexType>(size[dim_idx])' failed
accessor/accessor_helper.hpp:57: compute: Assertion `first < static_cast<IndexType>(size[dim_idx])' failed
accessor/accessor_helper.hpp:57: compute: Assertion `first < static_cast<IndexType>(size[dim_idx])' failed
accessor/accessor_helper.hpp:57: compute: Assertion `first < static_cast<IndexType>(size[dim_idx])' failed
[LOG] >>> iteration 0 completed ... Stopped the iteration process false
accessor/accessor_helper.hpp:57: compute: Assertion `first < static_cast<IndexType>(size[dim_idx])' failed
...
The solver still converges (has_converged = 1, 175 iterations) but the assertions suggest
incorrect memory access patterns in the HIP kernels.
Steps to reproduce
- Build Ginkgo
v1.10.0 with ROCm 6.4.3 and HIP backend (-DKokkos_ARCH_AMD_GFX90A=ON)
- Create a GMRES solver with Jacobi preconditioner on a
HipExecutor
- Call
solver->apply(rhs, u) on a sparse system of size ~100k
- Observe assertions firing 4 times per iteration in
accessor_helper.hpp:57
In the hope that you can answer my question, thank you very much in advance.
When running the GMRES solver with the HIP backend on AMD Instinct MI250X hardware (ROCm 6.4.3),
an assertion is triggered at every iteration of the solver:
The assertion fires exactly 4 times per iteration, starting from iteration 0, in a perfectly
systematic way. Despite the assertions, the solver converges correctly.
Environment
Ginkgo version
v1.10.0ROCm version
6.4.3-128GPU AMD Instinct MI250X (
gfx90a)OS Linux
Build HIP backend, no MPI, no CUDA
Minimal reproducer
With
N = 99792(99792×99792 sparse CSR matrix, ~7 non-zeros per row).Actual behavior
4 assertions per iteration, every iteration, from iteration 0:
The solver still converges (
has_converged = 1, 175 iterations) but the assertions suggestincorrect memory access patterns in the HIP kernels.
Steps to reproduce
v1.10.0with ROCm 6.4.3 and HIP backend (-DKokkos_ARCH_AMD_GFX90A=ON)HipExecutorsolver->apply(rhs, u)on a sparse system of size ~100kaccessor_helper.hpp:57In the hope that you can answer my question, thank you very much in advance.