LSTM training fail on single GPU, but not with multiple GPUs

With the latest versions of EDDL (1.2.0) and ECVL (1.1.0), I get a CUDA error when training the model using a single GPU. I have no problems when using 2 or 4 GPUs. The error occurs systematically at the beginning of the third epoch and does not seem to depend on the batch size. It does not depend on the memory consumption parameter (“full_mem”, “mid_mem” or “low_mem”), I tried all of them. The GPU is a nVidia V100. With previous versions of the libraries, this error did not occur (but I was using a different GPU).

```
.Traceback (most recent call last):
  File "C01_2_rec_mod_edll.py", line 98, in <module>
    fire.Fire({
  File "/root/miniconda3/envs/eddl/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/eddl/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/eddl/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "C01_2_rec_mod_edll.py", line 46, in train
    rec_mod.train()
  File "/mnt/datasets/uc5/UC5_pipeline_forked/src/eddl_lib/recurrent_module.py", line 289, in train
    eddl.train_batch(rnn, [cnn_visual, thresholded], [Y])
  File "/root/miniconda3/envs/eddl/lib/python3.8/site-packages/pyeddl/eddl.py", line 435, in train_batch
    return _eddl.train_batch(net, in_, out)
RuntimeError: [CUDA ERROR]: invalid argument (1) raised in delete_tensor | (check_cuda)
```

The code is not yet available on the repository, please let me know what details I can add.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LSTM training fail on single GPU, but not with multiple GPUs #338

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LSTM training fail on single GPU, but not with multiple GPUs #338

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions