RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

When training using any of the example configurations from the documentation I get the error:
"RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED"

**Reproducing**
For example running:
`python main.py --network_type rnn --dataset wikitext
`

**My system configuration**
CUDA 10.1
Python 3.7.3
PyTorch 1.1.0
Arch Linux
GPU: RTX 2070

Other PyTorch applications work just fine.

**Full output** (from pipenv environment):
```
% python main.py --network_type rnn --dataset wikitext                                                                    oliver@oliver
2019-06-14 16:30:31,585:INFO::[*] Make directories : logs/wikitext_2019-06-14_16-30-31
2019-06-14 16:30:49,909:INFO::regularizing:
2019-06-14 16:30:54,743:INFO::# of parameters: 169,315,278
2019-06-14 16:30:54,834:INFO::[*] MODEL dir: logs/wikitext_2019-06-14_16-30-31
2019-06-14 16:30:54,834:INFO::[*] PARAM path: logs/wikitext_2019-06-14_16-30-31/params.json
Traceback (most recent call last):
  File "main.py", line 54, in <module>
    main(args)
  File "main.py", line 34, in main
    trnr.train()
  File "/home/oliver/code/ENAS-pytorch/trainer.py", line 222, in train
    self.train_shared(dag=dag)
  File "/home/oliver/code/ENAS-pytorch/trainer.py", line 305, in train_shared
    dags)
  File "/home/oliver/code/ENAS-pytorch/trainer.py", line 251, in get_loss
    output, hidden, extra_out = self.shared(inputs, dag, hidden=hidden)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/oliver/code/ENAS-pytorch/models/shared_rnn.py", line 235, in forward
    logit, hidden = self.cell(x_t, hidden, dag)
  File "/home/oliver/code/ENAS-pytorch/models/shared_rnn.py", line 354, in cell
    output = self.batch_norm(output)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward
    exponential_average_factor, self.eps)
  File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/functional.py", line 1697, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
```

**Debugging**
Debugging the parameters passed to batch_norm I found that the following parameters are all on cuda-device: input, weight, bias, running_mean, running_var. Which is all reasonable.
The remaining vars are reasonable as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions