I'm trying to build the CUDA extension in this repository using:
pip install --no-build-isolation --no-cache-dir .
Environment
-
OS: Windows 10
-
Python: 3.12
-
PyTorch: 2.10.0+cu128
-
CUDA Toolkit: 12.9
-
GPU: (e.g. RTX 5060)
PS C:\Users\USER\Desktop\fused-ssim> pip install --no-build-isolation --no-cache-dir .
Defaulting to user installation because normal site-packages is not writeable
Processing ..
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch>=2.0.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from fused-ssim==1.0.0) (2.10.0+cu128)
Requirement already satisfied: filelock in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (3.24.3)
Requirement already satisfied: typing-extensions>=4.10.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (4.15.0)
Requirement already satisfied: sympy>=1.13.3 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (3.6.1)
Requirement already satisfied: jinja2 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (2026.2.0)
Requirement already satisfied: setuptools in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (80.9.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from sympy>=1.13.3->torch>=2.0.0->fused-ssim==1.0.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from jinja2->torch>=2.0.0->fused-ssim==1.0.0) (3.0.3)
Building wheels for collected packages: fused-ssim
Building wheel for fused-ssim (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for fused-ssim (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [111 lines of output]
Compiling for CUDA.
Detected GPU architecture: sm_120
Compiling for CUDA.
Detected GPU architecture: sm_120
running bdist_wheel
W0222 01:22:15.643000 12300 torch\utils\cpp_extension.py:659] Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
running build
running build_py
copying fused_ssim_init_.py -> build\lib.win-amd64-cpython-312\fused_ssim
running egg_info
writing fused_ssim.egg-info\PKG-INFO
writing dependency_links to fused_ssim.egg-info\dependency_links.txt
writing requirements to fused_ssim.egg-info\requires.txt
writing top-level names to fused_ssim.egg-info\top_level.txt
reading manifest file 'fused_ssim.egg-info\SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'fused_ssim.egg-info\SOURCES.txt'
running build_ext
W0222 01:22:15.683000 12300 torch\utils\cpp_extension.py:484] Error checking compiler version for cl: [WinError 2] 系統找不到指定的檔案。
W0222 01:22:15.692000 12300 torch\utils\cpp_extension.py:525] The detected CUDA version (12.9) has a minor version mismatch with the version that was used to compile PyTorch (12.8). Most likely this shouldn't be a problem.
==================================================
Building with GPU architecture: sm_120
==================================================
building 'fused_ssim_cuda' extension
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\USER\AppData\Roaming\Python\Python312\site-packages\torch\include -IC:\Users\USER\AppData\Roaming\Python\Python312\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -IC:\Python312\include -IC:\Python312\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" /EHsc /Tpext.cpp /Fobuild\temp.win-amd64-cpython-312\Release\ext.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -O3 -DFUSED_SSIM_CUDA -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_ssim_cuda /std:c++17
cl : 命令列 warning D9002 : 忽略未知的選項 '-O3'
ext.cpp
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin\nvcc" -c ssim.cu -o build\temp.win-amd64-cpython-312\Release\ssim.obj -IC:\Users\USER\AppData\Roaming\Python\Python312\site-packages\torch\include -IC:\Users\USER\AppData\Roaming\Python\Python312\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -IC:\Python312\include -IC:\Python312\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -DFUSED_SSIM_CUDA --maxrregcount=32 --use_fast_math -arch=sm_120 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_ssim_cuda -std=c++17 --use-local-env
C:/Program Files (x86)/Windows Kits/10//include/10.0.22621.0//um\winnt.h(24437): warning #174-D: expression has no effect
(CallbackEnviron);
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
C:/Program Files (x86)/Windows Kits/10//include/10.0.22621.0//um\winuser.h(14668): warning #108-D: signed bit field of length 1
BOOL fBarFocused:1;
^
C:/Program Files (x86)/Windows Kits/10//include/10.0.22621.0//um\winuser.h(14669): warning #108-D: signed bit field of length 1
BOOL fFocused:1;
^
C:/Program Files (x86)/Windows Kits/10//include/10.0.22621.0//um\wincrypt.h(21836): warning #1835-D: attribute "dllimport" does not apply here
typedef __declspec(dllimport) BOOL (__stdcall *PFN_CERT_IS_WEAK_HASH)(
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(730): warning #108-D: signed bit field of length 1
int fInDontFree :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(731): warning #108-D: signed bit field of length 1
int fDontCallFreeInst :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(732): warning #108-D: signed bit field of length 1
int fUnused1 :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(733): warning #108-D: signed bit field of length 1
int fHasReturn :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(734): warning #108-D: signed bit field of length 1
int fHasExtensions :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(735): warning #108-D: signed bit field of length 1
int fHasNewCorrDesc :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(736): warning #108-D: signed bit field of length 1
int fIsIn :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(737): warning #108-D: signed bit field of length 1
int fIsOut :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(738): warning #108-D: signed bit field of length 1
int fIsOicf :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(739): warning #108-D: signed bit field of length 1
int fBufferValid :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(740): warning #108-D: signed bit field of length 1
int fHasMemoryValidateCallback: 1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(741): warning #108-D: signed bit field of length 1
int fInFree :1;
^
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared\rpcndr.h(742): warning #108-D: signed bit field of length 1
int fNeedMCCP :1;
^
C:/Users/USER/AppData/Roaming/Python/Python312/site-packages/torch/include\c10/cuda/CUDACachingAllocator.h(212): error: invalid combination of type specifiers
StreamSegmentSize(cudaStream_t s, bool char, size_t sz)
^
C:/Users/USER/AppData/Roaming/Python/Python312/site-packages/torch/include\c10/cuda/CUDACachingAllocator.h(213): error: type name is not allowed
: stream(s), is_small_pool(char), total_size(sz) {}
^
2 errors detected in the compilation of "ssim.cu".
ssim.cu
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.9\\bin\\nvcc.EXE' failed with exit code 2
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for fused-ssim
Failed to build fused-ssim
error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects
╰─> fused-ssim
I'm trying to build the CUDA extension in this repository using:
pip install --no-build-isolation --no-cache-dir .
Environment
OS: Windows 10
Python: 3.12
PyTorch: 2.10.0+cu128
CUDA Toolkit: 12.9
GPU: (e.g. RTX 5060)
PS C:\Users\USER\Desktop\fused-ssim> pip install --no-build-isolation --no-cache-dir .
Defaulting to user installation because normal site-packages is not writeable
Processing ..
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch>=2.0.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from fused-ssim==1.0.0) (2.10.0+cu128)
Requirement already satisfied: filelock in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (3.24.3)
Requirement already satisfied: typing-extensions>=4.10.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (4.15.0)
Requirement already satisfied: sympy>=1.13.3 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (3.6.1)
Requirement already satisfied: jinja2 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (2026.2.0)
Requirement already satisfied: setuptools in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from torch>=2.0.0->fused-ssim==1.0.0) (80.9.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from sympy>=1.13.3->torch>=2.0.0->fused-ssim==1.0.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in C:\Users\USER\AppData\Roaming\Python\Python312\site-packages (from jinja2->torch>=2.0.0->fused-ssim==1.0.0) (3.0.3)
Building wheels for collected packages: fused-ssim
Building wheel for fused-ssim (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for fused-ssim (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [111 lines of output]
Compiling for CUDA.
Detected GPU architecture: sm_120
Compiling for CUDA.
Detected GPU architecture: sm_120
running bdist_wheel
W0222 01:22:15.643000 12300 torch\utils\cpp_extension.py:659] Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
running build
running build_py
copying fused_ssim_init_.py -> build\lib.win-amd64-cpython-312\fused_ssim
running egg_info
writing fused_ssim.egg-info\PKG-INFO
writing dependency_links to fused_ssim.egg-info\dependency_links.txt
writing requirements to fused_ssim.egg-info\requires.txt
writing top-level names to fused_ssim.egg-info\top_level.txt
reading manifest file 'fused_ssim.egg-info\SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'fused_ssim.egg-info\SOURCES.txt'
running build_ext
W0222 01:22:15.683000 12300 torch\utils\cpp_extension.py:484] Error checking compiler version for cl: [WinError 2] 系統找不到指定的檔案。
W0222 01:22:15.692000 12300 torch\utils\cpp_extension.py:525] The detected CUDA version (12.9) has a minor version mismatch with the version that was used to compile PyTorch (12.8). Most likely this shouldn't be a problem.
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for fused-ssim
Failed to build fused-ssim
error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects
╰─> fused-ssim