Tutorial on custom dataloaders (NOT datasets)

I really like [this](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#iterating-through-the-dataset) tutorial on custom datasets. However, the `torch.utils.data.DataLoader` class is only briefly mentioned in it:

>However, we are losing a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:

> * Batching the data
> * Shuffling the data
> * Load the data in parallel using multiprocessing workers.

> `torch.utils.data.DataLoader` is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest is collate_fn . You can specify how exactly the samples need to be batched using collate_fn . However, default collate should work fine for most use cases.

I am aware of this [issue](https://github.com/pytorch/tutorials/issues/78) and this [issue](https://github.com/pytorch/tutorials/issues/735) but neither have led to a tutorial.

I am happy to make a tutorial on custom dataloaders using the `torch.utils.data.DataLoader` class, focusing on how to interface with its parameters, especially the `num_workers` and `collate_fn` parameters. Also, I am not sure if it is possible to inherit from the `torch.utils.data.DataLoader` class, similar to the `torch.utils.data.Dataset`, so I would appreciate some guidance on this.

This would be my first ever tutorial, so some guidance on formatting would be greatly helpful.

cc @suraj813 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tutorial on custom dataloaders (NOT datasets) #1010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tutorial on custom dataloaders (NOT datasets) #1010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions