Fix a bug in tensornet backend scratch pad allocation in multi-GPU mode by 1tnguyen · Pull Request #2516 · NVIDIA/cuda-quantum

1tnguyen · 2025-01-17T03:21:24Z

Description

ScratchDeviceMem allocates memory based on memory availability on construction. This mechanism is not compatible with multi-GPU code path (MPI execution), whereby the CUDA device is selected in the simulator constructor; hence we need to defer the allocation until the device is selected.

Fixed by having a separate allocate method to be called once during the simulator backend constructor after device selection.

This bug was introduced in #1865, where the scratch pad is allocated once (scratch pad is a member variable of the simulator class) rather than on-demand to improve performance.

Add a unit test for this case, to be executed when there are multiple GPUs.

…r we've set the device Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

github-actions · 2025-01-17T04:47:41Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

bmhowe23

LGTM

…de (NVIDIA#2516) * Fix a bug in default init of scratchpad: it must allocate memory after we've set the device Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com> * Add test Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com> * Add a check to prevent multiple allocate calls Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com> --------- Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com> Signed-off-by: Anna Gringauze <agringauze@nvidia.com>

1tnguyen added 4 commits January 17, 2025 01:43

Fix a bug in default init of scratchpad: it must allocate memory afte…

730908e

…r we've set the device Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Merge branch 'main' into tnguyen/tensornet-scratchpad-init-bug

020e2b9

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Add test

090b407

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Add a check to prevent multiple allocate calls

12f619c

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

1tnguyen added the bug fix To be listed under Bug Fixes in the release notes label Jan 17, 2025

1tnguyen requested review from bmhowe23 and schweitzpgi January 17, 2025 03:23

github-actions bot pushed a commit that referenced this pull request Jan 17, 2025

Docs preview for PR #2516.

ee74594

bmhowe23 approved these changes Jan 17, 2025

View reviewed changes

1tnguyen merged commit 9e0b590 into NVIDIA:main Jan 17, 2025

github-actions bot pushed a commit that referenced this pull request Jan 17, 2025

Cleaning up docs preview for PR #2516.

9dafd8a

bettinaheim added this to the release 0.10.0 milestone Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug in tensornet backend scratch pad allocation in multi-GPU mode#2516

Fix a bug in tensornet backend scratch pad allocation in multi-GPU mode#2516
1tnguyen merged 4 commits intoNVIDIA:mainfrom
1tnguyen:tnguyen/tensornet-scratchpad-init-bug

1tnguyen commented Jan 17, 2025

Uh oh!

github-actions bot commented Jan 17, 2025

Uh oh!

bmhowe23 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

1tnguyen commented Jan 17, 2025

Description

Uh oh!

github-actions bot commented Jan 17, 2025

Uh oh!

bmhowe23 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants