Hi @nbeams @upsj @yhmtsai,
Following up on issues #2013 and #2015 with consolidated reproducer data
across Ginkgo 1.10 and 1.11 on Battlemage G31 (Intel Arc Pro B70 Pro).
Hardware Context
Battlemage G31 ist neue Hardware-Generation, bisher nicht in der Ginkgo
CI-Matrix. Erste systematische SYCL-Preconditioner-Tests auf dieser
Architektur. Block-Jacobi (maxBlockSize>1), ParICT, ISAI zeigen jeweils
unterschiedliche, reproduzierbare Failure-Modes in Ginkgo 1.10 und 1.11.
System: Intel Arc Pro B70 Pro (BMG-G31, 32 GB GDDR6), Compute Runtime 26.05
(pinned, 26.14 has known multi-rank issues), oneAPI 2026.0, IGC 2.32.7,
Ubuntu 26.04, kernel 7.0.
Bug 1: BJ(maxBlockSize>1) — find_blocks underflow
Reproducer: GKOCG + BJ(maxBlockSize=2), distributed across 8 ranks,
34M cells, ~4.25M cells per rank.
Failure: dpcpp::jacobi::find_blocks allocation request:
gko::AllocationError: dpcpp/base/executor.dp.cpp:104:
DPC++: failed to allocate memory block of 18446744073709551615 B
The value 2^64-1 indicates size_t underflow in block-counting arithmetic.
Confirmed in:
- Ginkgo 1.10 with V2 L0 adapter (default for BMG)
- Ginkgo 1.10 with V1 L0 adapter (
SYCL_UR_USE_LEVEL_ZERO_V2=0)
- Ginkgo 1.11 (KIT branch ogl_0600_gko110) — bit-identical underflow value
VRAM peak at crash: 8.4 GB / 27.9 GB available — not memory-related.
This rules out the L0 Unified Runtime layer; bug is in
dpcpp/preconditioner/jacobi/ integer arithmetic itself.
Bug 2: ICT (ParICT) — add_candidates SIGABRT
Reproducer: GKOCG + ICT preconditioner
Failure: SIGABRT in par_ict_factorization::add_candidates. Stack frame
in factorization/par_ict_kernels.dp.cpp. Confirmed in 1.10 + 1.11.
(The MPICH fix #1875 and ParILUT threshold fix #1877 in 1.11 do not address
this — different code path.)
Bug 3 ruled out: GKOGMRES "DEVICE_LOST"
Initial appearance suggested a 3rd bug; further investigation showed this is
hardware OOM, not a Ginkgo defect:
| Configuration |
VRAM dedicated peak |
Outcome |
| GMRES+BJ(1) krylovDim=30 |
27.87 GB / 27.9 GB |
DEVICE_LOST |
| GMRES+BJ(1) krylovDim=5 |
27.86 GB / 27.9 GB |
DEVICE_LOST |
GMRES allocates ~26 GB upfront overhead independent of krylovDim, exhausting
the 32 GB VRAM. Suggestion: a clean gko::AllocationError here would be more
helpful than the propagated SYCL UR_RESULT_ERROR_DEVICE_LOST.
What works on BMG-G31
| Solver |
Preconditioner |
Status |
| GKOCG |
BJ(maxBlockSize=1) |
✅ stable, 53 s/step (200-iter cap, no convergence) |
| GKOCG |
ISAI sp=1 |
✅ runs, mathematically diverges for SPD pressure system |
| Everything else |
|
❌ |
Working preconditioner with strong properties (Multigrid, IC, ILU) absent
on SYCL backend — see Finding 05 in repo.
Resources
Full reproducer details, logs, and VRAM traces:
https://github.com/heikogleu-dev/Openfoam13---GPU-Offloading-Intel-B70-Pro/tree/main/findings
Specifically:
- Finding 02 (find_blocks): /findings/02_bj_blocksize_int_underflow.md
- Finding 18 (V1 ruled out): /findings/18_v2_adapter_ruled_out.md
- Finding 21 (preconditioner mapping): /findings/21_preconditioner_mapping_bmg.md
- Finding 22 (VRAM characterization): /findings/22_vram_pressure_gmres_oom.md
Happy to run additional configurations or with debug builds if helpful for
narrowing down Bugs 1 and 2.
— Heiko
Hi @nbeams @upsj @yhmtsai,
Following up on issues #2013 and #2015 with consolidated reproducer data
across Ginkgo 1.10 and 1.11 on Battlemage G31 (Intel Arc Pro B70 Pro).
Hardware Context
Battlemage G31 ist neue Hardware-Generation, bisher nicht in der Ginkgo
CI-Matrix. Erste systematische SYCL-Preconditioner-Tests auf dieser
Architektur. Block-Jacobi (maxBlockSize>1), ParICT, ISAI zeigen jeweils
unterschiedliche, reproduzierbare Failure-Modes in Ginkgo 1.10 und 1.11.
System: Intel Arc Pro B70 Pro (BMG-G31, 32 GB GDDR6), Compute Runtime 26.05
(pinned, 26.14 has known multi-rank issues), oneAPI 2026.0, IGC 2.32.7,
Ubuntu 26.04, kernel 7.0.
Bug 1: BJ(maxBlockSize>1) — find_blocks underflow
Reproducer: GKOCG + BJ(maxBlockSize=2), distributed across 8 ranks,
34M cells, ~4.25M cells per rank.
Failure:
dpcpp::jacobi::find_blocksallocation request:gko::AllocationError: dpcpp/base/executor.dp.cpp:104:
DPC++: failed to allocate memory block of 18446744073709551615 B
The value 2^64-1 indicates size_t underflow in block-counting arithmetic.
Confirmed in:
SYCL_UR_USE_LEVEL_ZERO_V2=0)VRAM peak at crash: 8.4 GB / 27.9 GB available — not memory-related.
This rules out the L0 Unified Runtime layer; bug is in
dpcpp/preconditioner/jacobi/integer arithmetic itself.Bug 2: ICT (ParICT) — add_candidates SIGABRT
Reproducer: GKOCG + ICT preconditioner
Failure: SIGABRT in
par_ict_factorization::add_candidates. Stack framein
factorization/par_ict_kernels.dp.cpp. Confirmed in 1.10 + 1.11.(The MPICH fix #1875 and ParILUT threshold fix #1877 in 1.11 do not address
this — different code path.)
Bug 3 ruled out: GKOGMRES "DEVICE_LOST"
Initial appearance suggested a 3rd bug; further investigation showed this is
hardware OOM, not a Ginkgo defect:
GMRES allocates ~26 GB upfront overhead independent of krylovDim, exhausting
the 32 GB VRAM. Suggestion: a clean
gko::AllocationErrorhere would be morehelpful than the propagated SYCL
UR_RESULT_ERROR_DEVICE_LOST.What works on BMG-G31
Working preconditioner with strong properties (Multigrid, IC, ILU) absent
on SYCL backend — see Finding 05 in repo.
Resources
Full reproducer details, logs, and VRAM traces:
https://github.com/heikogleu-dev/Openfoam13---GPU-Offloading-Intel-B70-Pro/tree/main/findings
Specifically:
Happy to run additional configurations or with debug builds if helpful for
narrowing down Bugs 1 and 2.
— Heiko