Skip to content

Single Channel GaussianBlur over 23x23 kernels fails on Windows #5464

Open
@malfet

Description

@malfet

🐛 Describe the bug

See https://app.circleci.com/pipelines/github/pytorch/vision/14951/workflows/dd6ed737-f31d-4baa-8f42-9e69d55650aa/jobs/1205316

For some reason the same code works with CUDA-11.1, but fails with 11.3, which makes me highly suspicious of bug in cuDNN side

cuda-memcheck reports invalid memory access in `cudnnConvolutionForward ` call
(C:\Users\circleci\project\env) C:\Users\circleci\project\test>"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cuda-memcheck.exe" pytest test_transforms_tensor.py -k test_gaussian_blur[1-meth_kwargs1
========= CUDA-MEMCHECK
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: C:\Users\circleci\project, configfile: pytest.ini
plugins: cov-3.0.0, mock-3.6.1
collected 3066 items / 3064 deselected / 2 selected

test_transforms_tensor.py .FE                                                                                                                                                       [100%]

========================================================================================= ERRORS =========================================================================================
______________________________________________________________ ERROR at teardown of test_gaussian_blur[1-meth_kwargs1-cuda] ______________________________________________________________
Traceback (most recent call last):
  File "C:\Users\circleci\project\test\conftest.py", line 104, in prevent_leaking_rng
    torch.cuda.set_rng_state(cuda_rng_state)
  File "C:\Users\circleci\project\env\lib\site-packages\torch\cuda\random.py", line 64, in set_rng_state
    _lazy_call(cb)
  File "C:\Users\circleci\project\env\lib\site-packages\torch\cuda\__init__.py", line 155, in _lazy_call
    callable()
  File "C:\Users\circleci\project\env\lib\site-packages\torch\cuda\random.py", line 62, in cb
    default_generator.set_state(new_state_copy)
RuntimeError: CUDA error: unspecified launch failure
======================================================================================== FAILURES ========================================================================================
________________________________________________________________________ test_gaussian_blur[1-meth_kwargs1-cuda] _________________________________________________________________________
Traceback (most recent call last):
  File "C:\Users\circleci\project\test\test_transforms_tensor.py", line 963, in test_gaussian_blur
    _test_class_op(
  File "C:\Users\circleci\project\test\test_transforms_tensor.py", line 85, in _test_class_op
    _test_transform_vs_scripted_on_batch(f, scripted_fn, batch_tensors)
  File "C:\Users\circleci\project\test\test_transforms_tensor.py", line 36, in _test_transform_vs_scripted_on_batch
    transformed_batch = transform(batch_tensors)
  File "C:\Users\circleci\project\env\lib\site-packages\torch\nn\modules\module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "c:\users\circleci\project\torchvision\transforms\transforms.py", line 1817, in forward
    return F.gaussian_blur(img, self.kernel_size, [sigma, sigma])
  File "c:\users\circleci\project\torchvision\transforms\functional.py", line 1326, in gaussian_blur
    output = F_t.gaussian_blur(t_img, kernel_size, sigma)
  File "c:\users\circleci\project\torchvision\transforms\functional_tensor.py", line 774, in gaussian_blur
    img = conv2d(img, kernel, groups=img.shape[-3])
RuntimeError: CUDA error: unspecified launch failure
================================================================================ short test summary info =================================================================================
ERROR test_transforms_tensor.py::test_gaussian_blur[1-meth_kwargs1-cuda] - RuntimeError: CUDA error: unspecified launch failure
FAILED test_transforms_tensor.py::test_gaussian_blur[1-meth_kwargs1-cuda] - RuntimeError: CUDA error: unspecified launch failure
================================================================= 1 failed, 1 passed, 3064 deselected, 1 error in 35.57s =================================================================
========= Invalid __shared__ read of size 4
=========     at 0x00001d10 in volta_scudnn_128x32_3dconv_fprop_xregs_large_nn_v1
=========     by thread (95,0,0) in block (24,0,0)
=========     Address 0x0000250c is out of bounds
=========     Device Frame:volta_scudnn_128x32_3dconv_fprop_xregs_large_nn_v1 (volta_scudnn_128x32_3dconv_fprop_xregs_large_nn_v1 : 0x1d10)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x76888]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x76bb1]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x7b0da]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll (cuProfilerStop + 0x11cc6a) [0x33d9ea]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x17069d]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll (cuProfilerStop + 0xf0c72) [0x3119f2]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x38bdb]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x390af]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x39394]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll (cuLaunchKernel + 0x234) [0x20fc44]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll [0x3896]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll [0x26fd]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::getPlatform + 0xe9) [0x1d54529]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::transformTensor + 0x1bc1) [0x1dc0651]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::transformTensor + 0xbe6a) [0x1dca8fa]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::ConvDgradShader::isSplitK + 0x49b) [0x1ddcd9b]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Descriptor::initialize_internal + 0x618e) [0x5c67ce]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Descriptor::initialize_internal + 0x6eb1) [0x5c74f1]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineInterface::execute + 0x7e) [0x4e163e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineContainer<1012,113664>::execute_internal_impl + 0x2a) [0x54f27a]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineInterface::execute + 0x7e) [0x4e163e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::TensorDesc::operator== + 0x2d2) [0x544612]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineContainer<1,4096>::execute_internal_impl + 0xd241) [0x55c4d1]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineInterface::execute + 0x7e) [0x4e163e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::execute + 0x103f) [0x54eebf]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Tensor::Tensor + 0x18b6) [0x5ab246]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Tensor::Tensor + 0xbe1) [0x5aa571]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::convolutionForward + 0x10b) [0x65609b]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnnConvolutionForward + 0x331) [0x657081]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x4263) [0x48863]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x83a7) [0x4c9a7]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x7736) [0x4bd36]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x1ae5) [0x460e5]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x752f) [0x4bb2f]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_add_relu + 0x16ec) [0x43fec]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution + 0xc5) [0x428e5]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cu.dll (at::cuda::view_as_real + 0x14adc) [0x456680c]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cu.dll (at::cuda::bucketize_outf + 0x3df7a) [0x450361a]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::cudnn_convolution::call + 0x242) [0x70175b2]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::native::_convolution + 0xf5e) [0x692064e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::xlogy_ + 0x40e) [0x72d1bee]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::bmm + 0x1a1ed) [0x72975bd]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::_convolution::call + 0x2d6) [0x6d5a226]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::native::convolution + 0x164) [0x6928914]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::xlogy_ + 0xc6b) [0x72d244b]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::bmm + 0x1a2ca) [0x729769a]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::TensorMaker::make_tensor + 0x88e49) [0x6d40779]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::convolution::redispatch + 0x123) [0x6dc39a3]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (torch::autograd::GraphRoot::apply + 0x157b1) [0x7bfd851]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (torch::autograd::GraphRoot::apply + 0xc6c8) [0x7bf4768]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::convolution::call + 0x26f) [0x6d71b6f]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::native::conv2d + 0x1be) [0x69277be]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeimplicitautograd::where + 0x1db4) [0x73ac8e4]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeimplicitautograd::broadcast_to + 0x2a7a3) [0x738d953]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::conv2d::call + 0x219) [0x70ba239]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::conv2d + 0x64) [0x67106d4]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_python.dll (torch::FunctionSignature::operator= + 0x1096fc) [0x14d77c]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_python.dll (torch::FunctionSignature::operator= + 0x12f7ab) [0x17382b]
=========     Host Frame:C:\Users\circleci\project\env\python38.dll (PyMethodDef_RawFastCallKeywords + 0x410) [0x126fe0]
=========     Host Frame:C:\Users\circleci\project\env\python38.dll (PyObject_MakeTpCall + 0x106) [0x125fa6]
=========     Host Frame:C:\Users\circleci\project\env\python38.dll (PyEval_GetFuncDesc + 0x408) [0x2036b8]
=========
...

cc @peterjc123 @nbcsm @guyang3532 @maxluk @gunandrose4u @mszhanyi @vfdev-5 @datumbox

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions