Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 1.3 KB

File metadata and controls

19 lines (16 loc) · 1.3 KB

Troubleshooting

Known issues with workarounds:

Out of Memory Errors

Occasionally, your machine might run out of GPU memory during model training or while running tests. To fix this, you will need to change some hyperparameters in your configuration JSON. For example:

  • Lower the batch size by using a smaller value for train.batch_size. This reduces the number of samples being processed at once.
  • Reduce the number of workers by reducing dataset.num_workers. This will increase the time it takes to perform training/evaluation/etc. but the machine is less likely to run out of memory since there are fewer tasks being run in parallel.

NSS

  • Use fewer recurrent samples by decreasing the value of model.recurrent_samples.

Slow model training in WSL2 or Windows

A common cause for training running slower than expected in WSL2 or Windows is the GPU running out of device memory and falling back to slower shared system memory. To address this, reduce memory usage as described in the Out of Memory Errors section.