Skip to content

gdb: fix OOB read in makeQChamDiagTerms thread count#42

Open
dadamsncsa wants to merge 1 commit intor-ccs-cms:mainfrom
ncsa:fix/gdb-makediag-oob
Open

gdb: fix OOB read in makeQChamDiagTerms thread count#42
dadamsncsa wants to merge 1 commit intor-ccs-cms:mainfrom
ncsa:fix/gdb-makediag-oob

Conversation

@dadamsncsa
Copy link
Copy Markdown

Summary

MultGDBThrust<ElemT>::makeQChamDiagTerms() resizes the diagonal-Hamiltonian
output hii and launches the diagonal-term kernel using dets_.size(),
which is the storage length (D_size_ * n_dets), not the determinant count.
This results in D_size_x too many threads; surplus threads index past the
end of dets_.

At small problem sizes the OOB read silently lands in adjacent mapped pages
and leaves results bitwise correct (caught on H2O 1e-3 with results matching
the reference to all bits). At larger sizes the kernel faults with
cudaErrorIllegalAddress. Reproduced on a single NVIDIA GH200 (NCSA DeltaAI,
PrgEnv-nvidia, Cray CC wrapper) on H2O cc-pVDZ at 1e-4 (~2.4M dets).

Fix

Compute n_dets = dets_.size() / D_size_ and use it for both the resize and
the thrust::for_each_n iteration count.

Test plan

  • Bitwise-identical regression on H2O cc-pVDZ 1e-3 (1 GH200)
  • H2O cc-pVDZ 1e-4 (1 GH200): previously faulted; now completes; energy matches reference
  • Fe4S4 small case: matches reference energy

dets_ stores n_dets bit-strings of length D_size_ (total length =
D_size_ * n_dets), but makeQChamDiagTerms() resizes the diagonal-
Hamiltonian output hii and launches its kernel with dets_.size(),
giving D_size_x too many threads that overrun dets_ and trip
cudaErrorIllegalAddress in MakeQChamDiagTermKernel. Reproduced on
H2O cc-pVDZ at 1e-4 (~2.4M dets) on a single GH200.

Fix: derive n_dets = dets_.size() / D_size_ and use it for both the
resize and the for_each_n count. Bitwise-identical on H2O 1e-3;
unblocks H2O at 1e-4 and beyond.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant