Skip to content

BMG-G31 (Battlemage) SYCL preconditioner failures: BJ(>1) underflow + ICT SIGABRT — confirmed in 1.10 and 1.11 #2018

@heikogleu-dev

Description

@heikogleu-dev

Hi @nbeams @upsj @yhmtsai,

Following up on issues #2013 and #2015 with consolidated reproducer data
across Ginkgo 1.10 and 1.11 on Battlemage G31 (Intel Arc Pro B70 Pro).

Hardware Context

Battlemage G31 ist neue Hardware-Generation, bisher nicht in der Ginkgo
CI-Matrix. Erste systematische SYCL-Preconditioner-Tests auf dieser
Architektur. Block-Jacobi (maxBlockSize>1), ParICT, ISAI zeigen jeweils
unterschiedliche, reproduzierbare Failure-Modes in Ginkgo 1.10 und 1.11.

System: Intel Arc Pro B70 Pro (BMG-G31, 32 GB GDDR6), Compute Runtime 26.05
(pinned, 26.14 has known multi-rank issues), oneAPI 2026.0, IGC 2.32.7,
Ubuntu 26.04, kernel 7.0.

Bug 1: BJ(maxBlockSize>1) — find_blocks underflow

Reproducer: GKOCG + BJ(maxBlockSize=2), distributed across 8 ranks,
34M cells, ~4.25M cells per rank.

Failure: dpcpp::jacobi::find_blocks allocation request:
gko::AllocationError: dpcpp/base/executor.dp.cpp:104:
DPC++: failed to allocate memory block of 18446744073709551615 B

The value 2^64-1 indicates size_t underflow in block-counting arithmetic.

Confirmed in:

  • Ginkgo 1.10 with V2 L0 adapter (default for BMG)
  • Ginkgo 1.10 with V1 L0 adapter (SYCL_UR_USE_LEVEL_ZERO_V2=0)
  • Ginkgo 1.11 (KIT branch ogl_0600_gko110) — bit-identical underflow value

VRAM peak at crash: 8.4 GB / 27.9 GB available — not memory-related.

This rules out the L0 Unified Runtime layer; bug is in
dpcpp/preconditioner/jacobi/ integer arithmetic itself.

Bug 2: ICT (ParICT) — add_candidates SIGABRT

Reproducer: GKOCG + ICT preconditioner

Failure: SIGABRT in par_ict_factorization::add_candidates. Stack frame
in factorization/par_ict_kernels.dp.cpp. Confirmed in 1.10 + 1.11.

(The MPICH fix #1875 and ParILUT threshold fix #1877 in 1.11 do not address
this — different code path.)

Bug 3 ruled out: GKOGMRES "DEVICE_LOST"

Initial appearance suggested a 3rd bug; further investigation showed this is
hardware OOM, not a Ginkgo defect:

Configuration VRAM dedicated peak Outcome
GMRES+BJ(1) krylovDim=30 27.87 GB / 27.9 GB DEVICE_LOST
GMRES+BJ(1) krylovDim=5 27.86 GB / 27.9 GB DEVICE_LOST

GMRES allocates ~26 GB upfront overhead independent of krylovDim, exhausting
the 32 GB VRAM. Suggestion: a clean gko::AllocationError here would be more
helpful than the propagated SYCL UR_RESULT_ERROR_DEVICE_LOST.

What works on BMG-G31

Solver Preconditioner Status
GKOCG BJ(maxBlockSize=1) ✅ stable, 53 s/step (200-iter cap, no convergence)
GKOCG ISAI sp=1 ✅ runs, mathematically diverges for SPD pressure system
Everything else

Working preconditioner with strong properties (Multigrid, IC, ILU) absent
on SYCL backend — see Finding 05 in repo.

Resources

Full reproducer details, logs, and VRAM traces:
https://github.com/heikogleu-dev/Openfoam13---GPU-Offloading-Intel-B70-Pro/tree/main/findings

Specifically:

  • Finding 02 (find_blocks): /findings/02_bj_blocksize_int_underflow.md
  • Finding 18 (V1 ruled out): /findings/18_v2_adapter_ruled_out.md
  • Finding 21 (preconditioner mapping): /findings/21_preconditioner_mapping_bmg.md
  • Finding 22 (VRAM characterization): /findings/22_vram_pressure_gmres_oom.md

Happy to run additional configurations or with debug builds if helpful for
narrowing down Bugs 1 and 2.

— Heiko

Metadata

Metadata

Assignees

No one assigned

    Labels

    is:bugSomething looks wrong.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions