Skip to content

Conversation

@kyouma
Copy link
Contributor

@kyouma kyouma commented Mar 30, 2025

There is a Pytorch bug that occurs when an optimizer is created for a network which is on an MPS GPU (pytorch/pytorch#149184), thus a trick to avoid it is needed.

There is a Pytorch bug that occurs when an optimizer is created for a network which is on an MPS GPU (pytorch/pytorch#149184), thus a trick to avoid it is needed.
@kyouma kyouma changed the title Add Apple MPS GPU support Add Apple MPS GPU support for Pytorch Mar 30, 2025
@kyouma
Copy link
Contributor Author

kyouma commented Mar 30, 2025

Excuse me, I do not know how the build process is organized. The error message says:

RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB).
Tried to allocate 256 bytes on shared pool.
Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Maybe the PYTORCH_MPS_HIGH_WATERMARK_RATIO parameter is enough or maybe this Pytorch forum link ("Switching runs-on from macos-latest to macos-13") contains the way to solve the build issue.

@lululxvi
Copy link
Owner

I have no experience with Apple MPS GPU. The build file is here https://github.com/lululxvi/deepxde/blob/master/.github/workflows/build.yml

kyouma added 2 commits April 4, 2025 22:23
If the Pytorch optimizer bug is fixed and the line `torch._dynamo.disalbe()` (which raises an exception now) is removed, the added code will perform a simple check whether the MPS GPU can be used.
@kyouma
Copy link
Contributor Author

kyouma commented Apr 4, 2025

Hello. It seems that the GitHub Actions MPS-based environment doesn't support the necessary virtualization technology. As a result, PyTorch can detect the MPS GPU but cannot access it.

I have added a workaround to fall back to the CPU if an exception occurs. Now, the line that addresses the PyTorch optimizer bug checks for GPU availability. If the optimizer bug is fixed and that line is removed, running a few neural network layers will help verify the MPS functionality.

For future reference, please see the following links:

Edit:

Setting PYTORCH_MPS_HIGH_WATERMARK_RATIO to 0.0, as proposed by the Pytorch exception message before this workaround, didn't help.

One more link:

@lululxvi lululxvi merged commit 28cb8f0 into lululxvi:master Apr 8, 2025
13 of 14 checks passed
@kyouma kyouma deleted the patch-3 branch May 14, 2025 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants