Skip to content

[Bug Fix] Support isfinite/isnan/isinf for float16/bfloat16 on CUDA/HIP#75933

Merged
luotao1 merged 4 commits intoPaddlePaddle:developfrom
youge325:isfinite
Oct 24, 2025
Merged

[Bug Fix] Support isfinite/isnan/isinf for float16/bfloat16 on CUDA/HIP#75933
luotao1 merged 4 commits intoPaddlePaddle:developfrom
youge325:isfinite

Conversation

@youge325
Copy link
Copy Markdown
Contributor

PR Category

Environment Adaptation

PR Types

Bug fixes

Description

  • 在 isfinite_kernel_impl.h 的 GPU 侧 Isfinite/Isnan/Isinf 核函数里,把 “通用浮点” 模板拆成两支:一支只接受标准 float/double,另一支专门匹配 phi::float16phi::bfloat16。这避免了 std::is_floating_point 对这两种自定义半精度类型返回 false 而导致完全没有匹配内核的情况,从而补齐了半精度在 CUDA/HIP 上的 isfinite/isnan/isinf 支持。
  • 由于有了独立分支,调用的仍是对应的 isfinite/isnan/isinf 设备实现,逻辑保持一致,但现在 float16/bfloat16 会正确走到实际内核里,不再出现链接缺符号或运行时报 “未注册该数据类型” 的问题。
  • 去掉三个模板 IsfiniteKernel/IsinfKernel/IsnanKernelPADDLE_API 修饰,避免在头文件模板定义上做符号导出,引起重复导出或 Windows 下的装饰冲突。
  • 编译时的错误日志如下:
[943/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\isfinite_kernel.cu.obj
FAILED: paddle/phi/CMakeFiles/phi.dir/kernels/gpu/isfinite_kernel.cu.obj
C:\PROGRA~1\NVIDIA~2\CUDA\v13.0\bin\nvcc.exe -forward-unknown-to-host-compiler -DCUDA_TOOLKIT_ROOT_DIR="\"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0\"" -DCUDA_VERSION_MAJOR=\"13\" -DCUDA_VERSION_MINOR=\"0\" -DCUDNN_MAJOR_VERSION=\"9\" -DEIGEN_STRONG_INLINE=inline -DEIGEN_USE_GPU -DGLOG_NO_ABBREVIATED_SEVERITIES -DGOOGLE_GLOG_DLL_DECL="" -DLAPACK_FOUND -DNOMINMAX -DPADDLE_DISABLE_PROFILER -DPADDLE_DLL_EXPORT -DPADDLE_DLL_INFERENCE -DPADDLE_ON_INFERENCE -DPADDLE_VERSION=0.0.0 -DPADDLE_VERSION_INTEGER=0 -DPADDLE_WITH_AVX -DPADDLE_WITH_CCCL -DPADDLE_WITH_CRYPTO -DPADDLE_WITH_CUDA -DPADDLE_WITH_DNNL -DPADDLE_WITH_MKLML -DPADDLE_WITH_PIP_CUDA_LIBRARIES -DPADDLE_WITH_POCKETFFT -DPADDLE_WITH_SSE3 -DPADDLE_WITH_TENSORRT -DPHI_INNER -DPHI_SHARED -DSTATIC_IR -DTRT_PLUGIN_FP16_AVAILABLE -DUTF8PROC_STATIC -DWIN32_LEAN_AND_MEAN -DYAML_CPP_STATIC_DEFINE -D_XKEYCHECK_H -Dphi_EXPORTS -ID:\Paddle\third_party\cccl\thrust -ID:\Paddle\third_party\cccl\libcudacxx\include -ID:\Paddle\third_party\cccl\cub -ID:\Paddle\build -ID:\Paddle\paddle\fluid\framework\io -ID:\Paddle\patches\thrust -ID:\TensorRT-10.13.3.9\include -ID:\Paddle\build\third_party\install\zlib\include -ID:\Paddle\build\third_party\install -ID:\Paddle\build\third_party\install\gflags\include -ID:\Paddle\build\third_party\install\glog\include -ID:\Paddle\third_party\eigen3 -ID:\Paddle\third_party\threadpool -ID:\Paddle\third_party\dlpack\include -ID:\Paddle\build\third_party\install\xxhash\include -ID:\Paddle\build\third_party\install\warpctc\include -ID:\Paddle\build\third_party\install\warprnnt\include -ID:\Paddle\build\third_party\install\utf8proc\include -ID:\Paddle\build\third_party\install\mklml\include -ID:\Paddle\build\third_party\install\onednn\include -ID:\Paddle\build\third_party\install\protobuf\include -ID:\Paddle\third_party\nlohmann_json\include -ID:\Paddle\build\third_party\install\yaml-cpp\include -ID:\Users\Lenovo\AppData\Local\Programs\Python\Python310\include -ID:\Users\Lenovo\AppData\Local\Programs\Python\Python310\Lib\site-packages\numpy\core\include -ID:\Paddle\build\third_party\pybind\src\extern_pybind\include -ID:\Paddle\build\third_party\install\libuv\include -ID:\Paddle\third_party\cccl -ID:\Paddle\build\third_party\install\cryptopp\include -ID:\Paddle\build\third_party\pocketfft\src -ID:\Paddle\build\third_party\dirent\src\extern_dirent\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include" -ID:\Paddle -ID:\Paddle\paddle\phi\api\include\compat -ID:\Paddle\paddle\phi\api\include\compat\torch\csrc\api\include -ID:\Paddle\build\..\paddle\fluid\framework\io -D_WINDOWS -Xcompiler=" /W0  /GR /EHsc"  -gencode arch=compute_86,code=sm_86 -Xfatbin -compress-all -w --expt-relaxed-constexpr --expt-extended-lambda -Xcompiler "/wd4244 /wd4267 /wd4819 " -Xcompiler /bigobj -std=c++17  -Xcompiler="/arch:AVX" -Xcompiler="-MT -O2 -Ob2" -DNDEBUG -std=c++17 -MD -MT paddle\phi\CMakeFiles\phi.dir\kernels\gpu\isfinite_kernel.cu.obj -MF paddle\phi\CMakeFiles\phi.dir\kernels\gpu\isfinite_kernel.cu.obj.d -x cu -c D:\Paddle\paddle\phi\kernels\gpu\isfinite_kernel.cu -o paddle\phi\CMakeFiles\phi.dir\kernels\gpu\isfinite_kernel.cu.obj -Xcompiler=-Fdpaddle\phi\CMakeFiles\phi.dir\,-FS
isfinite_kernel.cu
tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.cpp
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(141): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(141): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(141): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(141): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(144): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsinfCUDAKernel<phi::float16,__int64>(const _ZN3phi7float16E *&,int64_t &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(156): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(156): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(156): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(156): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(159): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsinfCUDAKernel<phi::float16,unsigned int>(const _ZN3phi7float16E *&,unsigned int &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(171): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(171): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(171): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(171): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(174): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsinfCUDAKernel<phi::bfloat16,__int64>(const _ZN3phi8bfloat16E *&,int64_t &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(186): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(186): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(186): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(186): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(189): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsinfCUDAKernel<phi::bfloat16,unsigned int>(const _ZN3phi8bfloat16E *&,unsigned int &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(471): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(471): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(471): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(471): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(474): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsnanCUDAKernel<phi::float16,__int64>(const _ZN3phi7float16E *&,int64_t &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(486): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(486): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(486): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(486): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(489): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsnanCUDAKernel<phi::float16,unsigned int>(const _ZN3phi7float16E *&,unsigned int &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(501): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(501): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(501): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(501): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(504): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsnanCUDAKernel<phi::bfloat16,__int64>(const _ZN3phi8bfloat16E *&,int64_t &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(516): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(516): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(516): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(516): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(519): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsnanCUDAKernel<phi::bfloat16,unsigned int>(const _ZN3phi8bfloat16E *&,unsigned int &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(711): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(711): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(711): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(711): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(714): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsfiniteCUDAKernel<phi::float16,__int64>(const _ZN3phi7float16E *&,int64_t &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(726): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(726): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(726): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(726): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(729): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsfiniteCUDAKernel<phi::float16,unsigned int>(const _ZN3phi7float16E *&,unsigned int &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(741): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(741): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(741): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(741): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(744): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsfiniteCUDAKernel<phi::bfloat16,__int64>(const _ZN3phi8bfloat16E *&,int64_t &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(756): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(756): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(756): error C2660: '__cudaGetKernel': function does not take 1 arguments
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\crt/device_functions.h(2930): note: see declaration of '__cudaGetKernel'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(756): note: while trying to match the argument list '(cudaKernel_t *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(759): error C2912: explicit specialization 'void phi::__wrapper__device_stub_IsfiniteCUDAKernel<phi::bfloat16,unsigned int>(const _ZN3phi8bfloat16E *&,unsigned int &,bool *&,_ZNSt9enable_ifILb1EvE4typeE *&)' is not a specialization of a function template
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(893): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(893): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(893): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(893): note: while trying to match the argument list '(void **, char *, const char [123], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(894): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(894): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(894): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(894): note: while trying to match the argument list '(void **, char *, const char [123], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(895): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(895): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(895): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(895): note: while trying to match the argument list '(void **, char *, const char [122], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(896): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(896): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(896): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(896): note: while trying to match the argument list '(void **, char *, const char [122], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(909): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(909): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(909): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(909): note: while trying to match the argument list '(void **, char *, const char [120], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(910): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(910): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(910): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(910): note: while trying to match the argument list '(void **, char *, const char [120], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(911): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(911): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(911): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(911): note: while trying to match the argument list '(void **, char *, const char [119], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(912): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(912): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(912): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(912): note: while trying to match the argument list '(void **, char *, const char [119], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(931): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(931): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(931): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(931): note: while trying to match the argument list '(void **, char *, const char [120], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(932): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi8bfloat16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(932): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(932): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(932): note: while trying to match the argument list '(void **, char *, const char [120], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(933): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,unsigned int,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(933): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(933): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(933): note: while trying to match the argument list '(void **, char *, const char [119], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(934): error C2440: 'type cast': cannot convert from 'overloaded-function' to 'void (__cdecl *)(const _ZN3phi7float16E *,int64_t,bool *,_ZNSt9enable_ifILb1EvE4typeE *)'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(934): note: None of the functions with this name in scope match the target type
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(934): error C2660: '__cudaRegisterFunction': function does not take 9 arguments
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include\crt/host_runtime.h(246): note: see declaration of '__cudaRegisterFunction'
C:\Users\Lenovo\AppData\Local\Temp\tmpxft_00002620_00000000-7_isfinite_kernel.cudafe1.stub.c(934): note: while trying to match the argument list '(void **, char *, const char [119], int, uint3 *, uint3 *, dim3 *, dim3 *, int *)'
[944/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\inverse_grad_kernel.cu.obj
inverse_grad_kernel.cu
tmpxft_00008b7c_00000000-7_inverse_grad_kernel.cudafe1.cpp
[945/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\kldiv_loss_grad_kernel.cu.obj
kldiv_loss_grad_kernel.cu
tmpxft_00005cc4_00000000-7_kldiv_loss_grad_kernel.cudafe1.cpp
[946/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\kldiv_loss_kernel.cu.obj
kldiv_loss_kernel.cu
tmpxft_00000b18_00000000-7_kldiv_loss_kernel.cudafe1.cpp
[947/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\kron_grad_kernel.cu.obj
kron_grad_kernel.cu
tmpxft_00003e30_00000000-7_kron_grad_kernel.cudafe1.cpp
[948/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\kthvalue_grad_kernel.cu.obj
kthvalue_grad_kernel.cu
tmpxft_00006d80_00000000-7_kthvalue_grad_kernel.cudafe1.cpp
[949/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\kron_kernel.cu.obj
kron_kernel.cu
tmpxft_000050d8_00000000-7_kron_kernel.cudafe1.cpp
[950/2749] Building CUDA object paddle\phi\CMakeFiles\phi.dir\kernels\gpu\elementwise_grad_kernel.cu.obj
elementwise_grad_kernel.cu
tmpxft_00006924_00000000-7_elementwise_grad_kernel.cudafe1.cpp
ninja: build stopped: subcommand failed.

- 在 isfinite_kernel_impl.h 的 GPU 侧 `Isfinite/Isnan/Isinf` 核函数里,把 “通用浮点” 模板拆成两支:一支只接受标准 `float/double`,另一支专门匹配 `phi::float16` 和 `phi::bfloat16`。这避免了 `std::is_floating_point` 对这两种自定义半精度类型返回 `false` 而导致完全没有匹配内核的情况,从而补齐了半精度在 CUDA/HIP 上的 `isfinite/isnan/isinf` 支持。
- 由于有了独立分支,调用的仍是对应的 `isfinite/isnan/isinf` 设备实现,逻辑保持一致,但现在 `float16/bfloat16` 会正确走到实际内核里,不再出现链接缺符号或运行时报 “未注册该数据类型” 的问题。
- 去掉三个模板 `IsfiniteKernel/IsinfKernel/IsnanKernel` 的 `PADDLE_API` 修饰,避免在头文件模板定义上做符号导出,引起重复导出或 Windows 下的装饰冲突。
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Oct 18, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot Bot added the contributor External developers label Oct 18, 2025
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Oct 21, 2025
Copy link
Copy Markdown
Contributor

@risemeup1 risemeup1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit f29f693 into PaddlePaddle:develop Oct 24, 2025
52 of 53 checks passed
@youge325 youge325 deleted the isfinite branch October 24, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers HappyOpenSource 快乐开源活动issue与PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants