Segfaults with cudnn>=9.11 for pre-Turing devices (<=sm_70)

When preparing the pytorch [v2.8 release](https://github.com/conda-forge/pytorch-cpu-feedstock/pull/409), @mgorny ran into a bunch of segfaults. After some painful [debugging](https://github.com/conda-forge/pytorch-cpu-feedstock/pull/411) against the 2.7 branch (diff environments against last known passing run, try to recreate passing run, then relax constraints again one by one), the conclusion was that we need to pin to `cudnn <9.11`, but not really _why_ that occurs.

Segfaults looked like
```
........................................................................ [ 23%]
Fatal Python error: Segmentation fault

Thread 0x00007f295ffff640 (most recent call first):
  <no Python frame>

Thread 0x00007f2b48cc6640 (most recent call first):
```
resp.
```
=================================== FAILURES ===================================
____________________________ test/test_autograd.py _____________________________
[gw0] linux -- Python 3.13.5 $PREFIX/bin/python3.13
worker 'gw0' crashed while running 'test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_rnn_backward_to_input_but_not_parameters_cuda'
_____________________________ test/test_modules.py _____________________________
[gw1] linux -- Python 3.13.5 $PREFIX/bin/python3.13
worker 'gw1' crashed while running 'test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float32'
_______________________________ test/test_nn.py ________________________________
[gw2] linux -- Python 3.13.5 $PREFIX/bin/python3.13
worker 'gw2' crashed while running 'test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_cudnn_cuda'
_____________________________ test/test_modules.py _____________________________
[gw3] linux -- Python 3.13.5 $PREFIX/bin/python3.13
worker 'gw3' crashed while running 'test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float64'
_______________________________ test/test_nn.py ________________________________
[gw4] linux -- Python 3.13.5 $PREFIX/bin/python3.13
worker 'gw4' crashed while running 'test/test_nn.py::TestNN::test_RNN_change_dropout'
================== xdist: maximum crashed workers reached: 4 ===================

```
but note the `maximum crashed workers reached`, so there's likely many more.

One way to test this for the @conda-forge/cudnn folks would be to install pytorch v2.7.1 (where we only have a `cudnn >=9.10.1.4,<10.0a0` constraint), and then run the test suite. For python v2.8.0, you'd have to destructively alter the environment (e.g. copy newer cudnn into `$PREFIX`), because the metadata won't let the solver install newer cudnn.

Obviously we want to get rid of the upper bound ASAP, because other feedstocks pulling in cudnn v9.x will create `>=9.x,<10` run-exports, and then be incompatible with pytorch v2.8.

Sidenote: @carterbox [spoke of](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/413#issuecomment-3313289676) an ABI break between pytorch v2.7.0 and v2.7.1 related to v2.7.1 having been built against the pybind v3 ABI, but AFAIU that is only a question for packages building on top of pytorch, not pytorch itself; so I don't think this is the reason, but thought I'd mention it for completeness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Segfaults with cudnn>=9.11 for pre-Turing devices (<=sm_70) #124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Segfaults with cudnn>=9.11 for pre-Turing devices (<=sm_70) #124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions