Skip to content

Tutorial on custom dataloaders (NOT datasets) #1010

Open
@mhdadk

Description

@mhdadk

I really like this tutorial on custom datasets. However, the torch.utils.data.DataLoader class is only briefly mentioned in it:

However, we are losing a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:

  • Batching the data
  • Shuffling the data
  • Load the data in parallel using multiprocessing workers.

torch.utils.data.DataLoader is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest is collate_fn . You can specify how exactly the samples need to be batched using collate_fn . However, default collate should work fine for most use cases.

I am aware of this issue and this issue but neither have led to a tutorial.

I am happy to make a tutorial on custom dataloaders using the torch.utils.data.DataLoader class, focusing on how to interface with its parameters, especially the num_workers and collate_fn parameters. Also, I am not sure if it is possible to inherit from the torch.utils.data.DataLoader class, similar to the torch.utils.data.Dataset, so I would appreciate some guidance on this.

This would be my first ever tutorial, so some guidance on formatting would be greatly helpful.

cc @suraj813 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Metadata

Metadata

Labels

60_min_blitzCleanups for the 60 min blitz tutorial:https://pytorch.org/tutorials/beginner/deep_learning_60min_badvanceddocathon-h2-2023enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions