Skip to content

Fix a bug in tensornet backend scratch pad allocation in multi-GPU mode#2516

Merged
1tnguyen merged 4 commits intoNVIDIA:mainfrom
1tnguyen:tnguyen/tensornet-scratchpad-init-bug
Jan 17, 2025
Merged

Fix a bug in tensornet backend scratch pad allocation in multi-GPU mode#2516
1tnguyen merged 4 commits intoNVIDIA:mainfrom
1tnguyen:tnguyen/tensornet-scratchpad-init-bug

Conversation

@1tnguyen
Copy link
Collaborator

Description

ScratchDeviceMem allocates memory based on memory availability on construction. This mechanism is not compatible with multi-GPU code path (MPI execution), whereby the CUDA device is selected in the simulator constructor; hence we need to defer the allocation until the device is selected.

Fixed by having a separate allocate method to be called once during the simulator backend constructor after device selection.

This bug was introduced in #1865, where the scratch pad is allocated once (scratch pad is a member variable of the simulator class) rather than on-demand to improve performance.

Add a unit test for this case, to be executed when there are multiple GPUs.

…r we've set the device

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
@1tnguyen 1tnguyen added the bug fix To be listed under Bug Fixes in the release notes label Jan 17, 2025
@github-actions
Copy link

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Jan 17, 2025
Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@1tnguyen 1tnguyen merged commit 9e0b590 into NVIDIA:main Jan 17, 2025
github-actions bot pushed a commit that referenced this pull request Jan 17, 2025
@bettinaheim bettinaheim added this to the release 0.10.0 milestone Mar 12, 2025
annagrin pushed a commit to annagrin/cuda-quantum that referenced this pull request Jun 17, 2025
…de (NVIDIA#2516)

* Fix a bug in default init of scratchpad: it must allocate memory after we've set the device

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

* Add test

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

* Add a check to prevent multiple allocate calls

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

---------

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Anna Gringauze <agringauze@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix To be listed under Bug Fixes in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants