Skip to content

Commit 299f2c2

Browse files
authored
[DLMED] improve doc-string of partition_dataset (#2109)
Signed-off-by: Nic Ma <[email protected]>
1 parent ca26e51 commit 299f2c2

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

monai/data/utils.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -753,6 +753,27 @@ def partition_dataset(
753753
And it can split the dataset based on specified ratios or evenly split into `num_partitions`.
754754
Refer to: https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py.
755755
756+
Note:
757+
It also can be used to partition dataset for ranks in distributed training.
758+
For example, partition dataset before training and use `CacheDataset`, every rank trains with its own data.
759+
It can avoid duplicated caching content in each rank, but will not do global shuffle before every epoch:
760+
761+
.. code-block:: python
762+
763+
data_partition = partition_dataset(
764+
data=train_files,
765+
num_partitions=dist.get_world_size(),
766+
shuffle=True,
767+
even_divisible=True,
768+
)[dist.get_rank()]
769+
770+
train_ds = SmartCacheDataset(
771+
data=data_partition,
772+
transform=train_transforms,
773+
replace_rate=0.2,
774+
cache_num=15,
775+
)
776+
756777
Args:
757778
data: input dataset to split, expect a list of data.
758779
ratios: a list of ratio number to split the dataset, like [8, 1, 1].

0 commit comments

Comments
 (0)