Description
I really like this tutorial on custom datasets. However, the torch.utils.data.DataLoader
class is only briefly mentioned in it:
However, we are losing a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:
- Batching the data
- Shuffling the data
- Load the data in parallel using multiprocessing workers.
torch.utils.data.DataLoader
is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest is collate_fn . You can specify how exactly the samples need to be batched using collate_fn . However, default collate should work fine for most use cases.
I am aware of this issue and this issue but neither have led to a tutorial.
I am happy to make a tutorial on custom dataloaders using the torch.utils.data.DataLoader
class, focusing on how to interface with its parameters, especially the num_workers
and collate_fn
parameters. Also, I am not sure if it is possible to inherit from the torch.utils.data.DataLoader
class, similar to the torch.utils.data.Dataset
, so I would appreciate some guidance on this.
This would be my first ever tutorial, so some guidance on formatting would be greatly helpful.
cc @suraj813 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen