Skip to content

Conversation

@goanpeca
Copy link
Collaborator

@goanpeca goanpeca commented Sep 4, 2025

Fix MPS by detaching before moving to cpu and moving to numpy array

Copilot AI review requested due to automatic review settings September 4, 2025 16:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue with MPS (Metal Performance Shaders) compatibility by adding detach() calls before moving tensors to CPU in the FID score calculation.

Key Changes

  • Added .detach() calls before .cpu() for mu1, mu2, sigma1, and sigma2 tensors to prevent gradient tracking issues when moving from GPU to CPU

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added the module: metrics Metrics module label Sep 4, 2025
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 4, 2025

@goanpeca please post here the error you see without this PR to understand better why we need now to call detach

@goanpeca
Copy link
Collaborator Author

goanpeca commented Sep 4, 2025

@vfdev-5 it was a suggestion from copilot on the first PR to add the detach method before the CPU.

Added .detach() calls before .cpu() for mu1, mu2, sigma1, and sigma2 tensors to prevent gradient tracking issues when
moving from GPU to CPU

Detach the tensor from the computational graph (if it requires gradients):
If the tensor was created with requires_grad=True or is part of a computation where gradients are being tracked, you need to detach it from the computational graph using the .detach() method. This prevents errors related to attempting to convert a tensor that requires gradients to a NumPy array, as NumPy arrays do not have a concept of gradients.

Is this something that could apply to mu1, mu2, sigma1, sigma2?

@goanpeca goanpeca marked this pull request as draft September 4, 2025 22:39
@goanpeca goanpeca force-pushed the fix/mps-test branch 4 times, most recently from 5190667 to 55f3789 Compare September 5, 2025 14:22
@goanpeca
Copy link
Collaborator Author

goanpeca commented Sep 5, 2025

@vfdev-5 the error was something on

tests/ignite/metrics/gan/test_fid.py:51: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ignite/metrics/gan/fid.py:40: in fid_score
    covmean, _ = scipy.linalg.sqrtm(sigma1.mm(sigma2).cpu(), disp=False)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

...
          if disp is False:
              try:
  >               arg2 = norm(res @ res - A, 'fro')**2 / norm(A, 'fro')
                              ^^^^^^^^^^^^^
  E               TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'Tensor'
  
  .venv/lib/python3.11/site-packages/scipy/linalg/_matfuncs.py:547: TypeError

so that is why we had to add a .numpy() call. For the detach see previous message, it was a suggestion from copilot, but if it does not apply, I can revert it!

@goanpeca goanpeca marked this pull request as ready for review September 5, 2025 14:31
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gonzalo!

@vfdev-5 vfdev-5 enabled auto-merge September 5, 2025 14:46
@vfdev-5 vfdev-5 added this pull request to the merge queue Sep 5, 2025
Merged via the queue into pytorch:master with commit 1fc3214 Sep 5, 2025
26 checks passed
@goanpeca goanpeca deleted the fix/mps-test branch September 5, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: metrics Metrics module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants