gdb: fix OOB read in makeQChamDiagTerms thread count#42
Open
dadamsncsa wants to merge 1 commit intor-ccs-cms:mainfrom
Open
gdb: fix OOB read in makeQChamDiagTerms thread count#42dadamsncsa wants to merge 1 commit intor-ccs-cms:mainfrom
dadamsncsa wants to merge 1 commit intor-ccs-cms:mainfrom
Conversation
dets_ stores n_dets bit-strings of length D_size_ (total length = D_size_ * n_dets), but makeQChamDiagTerms() resizes the diagonal- Hamiltonian output hii and launches its kernel with dets_.size(), giving D_size_x too many threads that overrun dets_ and trip cudaErrorIllegalAddress in MakeQChamDiagTermKernel. Reproduced on H2O cc-pVDZ at 1e-4 (~2.4M dets) on a single GH200. Fix: derive n_dets = dets_.size() / D_size_ and use it for both the resize and the for_each_n count. Bitwise-identical on H2O 1e-3; unblocks H2O at 1e-4 and beyond.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MultGDBThrust<ElemT>::makeQChamDiagTerms()resizes the diagonal-Hamiltonianoutput
hiiand launches the diagonal-term kernel usingdets_.size(),which is the storage length (
D_size_ * n_dets), not the determinant count.This results in
D_size_xtoo many threads; surplus threads index past theend of
dets_.At small problem sizes the OOB read silently lands in adjacent mapped pages
and leaves results bitwise correct (caught on H2O 1e-3 with results matching
the reference to all bits). At larger sizes the kernel faults with
cudaErrorIllegalAddress. Reproduced on a single NVIDIA GH200 (NCSA DeltaAI,PrgEnv-nvidia, Cray CC wrapper) on H2O cc-pVDZ at 1e-4 (~2.4M dets).
Fix
Compute
n_dets = dets_.size() / D_size_and use it for both the resize andthe
thrust::for_each_niteration count.Test plan