Flower Datasets (flwr-datasets) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the Flower Labs team that also created Flower: A Friendly Federated AI Framework.
Tip
For complete documentation that includes API docs, how-to guides and tutorials, please visit the Flower Datasets Documentation and for full FL example see the Flower Examples page.
For a complete installation guide visit the Flower Datasets Documentation
pip install flwr-datasets[vision]Flower Datasets library supports:
- downloading datasets - choose the dataset from Hugging Face's
datasets, - partitioning datasets - customize the partitioning scheme,
- creating centralized datasets - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).
Thanks to using Hugging Face's datasets used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
- Hugging Face,
- PyTorch,
- TensorFlow,
- Numpy,
- Pandas,
- Jax,
- Arrow.
Create custom partitioning schemes or choose from the implemented partitioning schemes:
- Partitioner (the abstract base class)
Partitioner - IID partitioning
IidPartitioner(num_partitions) - Dirichlet partitioning
DirichletPartitioner(num_partitions, partition_by, alpha) - Distribution partitioning
DistributionPartitioner(distribution_array, num_partitions, num_unique_labels_per_partition, partition_by, preassigned_num_samples_per_label, rescale) - InnerDirichlet partitioning
InnerDirichletPartitioner(partition_sizes, partition_by, alpha) - Pathological partitioning
PathologicalPartitioner(num_partitions, partition_by, num_classes_per_partition, class_assignment_mode) - Natural ID partitioning
NaturalIdPartitioner(partition_by) - Size based partitioning (the abstract base class for the partitioners dictating the division based the number of samples)
SizePartitioner - Linear partitioning
LinearPartitioner(num_partitions) - Square partitioning
SquarePartitioner(num_partitions) - Exponential partitioning
ExponentialPartitioner(num_partitions) - more to come in the future releases (contributions are welcome).
Comparison of Partitioning Schemes on CIFAR10
PS: This plot was generated using a library function (see flwr_datasets.visualization package for more).
Flower Datasets exposes the FederatedDataset abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: load_partition(partition_id, split) and load_split(split).
Here's a basic quickstart example of how to partition the MNIST dataset:
from flwr_datasets import FederatedDataset
from flwr_datasets.partitioners import IidPartitioner
# The train split of the MNIST dataset will be partitioned into 100 partitions
partitioner = IidPartitioner(num_partitions=100)
fds = FederatedDataset("ylecun/mnist", partitioners={"train": partitioner})
partition = fds.load_partition(0)
centralized_data = fds.load_split("test")
For more details, please refer to the specific how-to guides or tutorials. They showcase customization and more advanced features.