Skip to content

Commit c57a5bd

Browse files
docs(datasets): Add flwr-datasets reference (#6520)
Co-authored-by: jafermarq <javier@flower.ai>
1 parent c26dad1 commit c57a5bd

File tree

5 files changed

+170
-6
lines changed

5 files changed

+170
-6
lines changed

datasets/docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454
"sphinx.ext.graphviz",
5555
"sphinxarg.ext",
5656
"myst_parser",
57+
"sphinx_click",
5758
"sphinx_copybutton",
5859
"sphinx_design",
5960
"sphinxcontrib.mermaid",
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
.. |context_link| replace:: ``Context``
2+
3+
.. _context_link: https://flower.ai/docs/framework/ref-api/flwr.app.Context.html
4+
5+
.. |clientapp_link| replace:: ``ClientApp``
6+
7+
.. _clientapp_link: https://flower.ai/docs/framework/ref-api/flwr.clientapp.ClientApp.html
8+
9+
Generate Demo Data for SuperNodes
10+
=================================
11+
12+
In Flower simulations, datasets are downloaded and partitioned on-the-fly.
13+
While convenient for prototyping, production deployments require SuperNodes
14+
to have pre-existing data on disk. This ensures immediate startup, data
15+
persistence across restarts, and a setup that mirrors real-world federated
16+
AI where each node owns its local data.
17+
18+
Flower Datasets enables you to generate pre-partitioned datasets for
19+
deployment prototyping using the Flower Datasets CLI. By materializing partitions to disk ahead of time, each
20+
SuperNode can read from its designated partition—just as it would in
21+
production.
22+
23+
.. note::
24+
25+
This guide is intended for generating demo data for testing deployments. For
26+
production deployments, ensure that each SuperNode has access to its own
27+
local data partition.
28+
29+
30+
Using the Flower Datasets CLI
31+
-----------------------------
32+
33+
The ``flwr-datasets create`` command enables you to download a dataset,
34+
partition it, and save each partition to disk in a single step. For complete
35+
details on all available options, see the :doc:`ref-api-cli`.
36+
37+
For example, to generate demo data from the `MNIST dataset <https://huggingface.co/datasets/ylecun/mnist>`_ with five
38+
partitions and store the result in the ``./demo_data`` directory (it will be created if it doesn't exist), run the
39+
following command in your terminal:
40+
41+
.. code-block:: bash
42+
43+
# flwr-datasets create <dataset> --num-partitions <n> --out-dir <dir>
44+
flwr-datasets create ylecun/mnist --num-partitions 5 --out-dir demo_data
45+
46+
# The output will look similar to this:
47+
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 3085.94 examples/s]
48+
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 4006.59 examples/s]
49+
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 4001.21 examples/s]
50+
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 4010.60 examples/s]
51+
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 3990.48 examples/s]
52+
🎊 Created 5 partitions for 'ylecun/mnist' in '/path/to/demo_data'
53+
54+
The above command generates the following directory structure:
55+
56+
.. code-block:: text
57+
58+
demo_data/
59+
├── partition_0/
60+
│ ├── data-00000-of-00001.arrow
61+
│ ├── dataset_info.json
62+
│ └── state.json
63+
...
64+
└── partition_4/
65+
├── data-00000-of-00001.arrow
66+
├── dataset_info.json
67+
└── state.json
68+
69+
70+
Using Generated Demo Data in SuperNodes
71+
---------------------------------------
72+
73+
Once you have generated the partitions, each SuperNode can be configured to
74+
load its designated partition. The recommended approach is to pass the
75+
partition path as a node configuration parameter when starting the SuperNode.
76+
77+
Passing the Data Path to a SuperNode
78+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
79+
80+
Use the ``--node-config`` flag to specify the path to the partition when
81+
launching a SuperNode. In the example below, the selected key ``data-path``
82+
is arbitrary and provided for illustration only; any application-appropriate
83+
key may be used.
84+
85+
.. code-block:: bash
86+
87+
flower-supernode \
88+
--insecure \
89+
--node-config="data-path=/path/to/demo_data/partition_0"
90+
91+
92+
Loading the Dataset in Your ClientApp
93+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94+
95+
In your |clientapp_link|_, you can access the configured data path through the
96+
|context_link|_ and load the dataset using the
97+
``load_from_disk`` function from the Huggingface ``datasets`` module:
98+
99+
.. code-block:: python
100+
:emphasize-lines: 12,15
101+
102+
from flwr.app import Context, Message
103+
from flwr.clientapp import ClientApp
104+
from datasets import load_from_disk
105+
106+
app = ClientApp()
107+
108+
109+
@app.train()
110+
def train(msg: Message, context: Context) -> Message:
111+
"""Train the model on local data."""
112+
# Retrieve the data path from node configuration
113+
dataset_path = context.node_config["data-path"]
114+
115+
# Load the partition from disk
116+
partition = load_from_disk(dataset_path)
117+
118+
# Use the dataset for training
119+
# ...
120+
121+
122+
.. tip::
123+
124+
For a complete guide on how to run Flower SuperNodes, refer to the
125+
`Deployment Runtime Documentation <https://flower.ai/docs/framework/how-to-run-flower-with-deployment-engine.html>`_.
126+

datasets/docs/source/index.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,19 +45,19 @@ Problem-oriented how-to guides show step-by-step how to achieve a specific goal.
4545
how-to-use-with-numpy
4646
how-to-use-with-local-data
4747
how-to-disable-enable-progress-bar
48+
how-to-generate-demo-data-for-deployment
4849

4950
References
5051
~~~~~~~~~~
5152

5253
Information-oriented API reference and other reference material.
5354

54-
.. autosummary::
55-
:toctree: ref-api
56-
:template: autosummary/module.rst
57-
:caption: API reference
58-
:recursive:
55+
.. toctree::
56+
:titlesonly:
57+
:maxdepth: 2
58+
:caption: References
5959

60-
flwr_datasets
60+
reference
6161

6262
.. toctree::
6363
:maxdepth: 1
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
###############################
2+
Flower Datasets CLI reference
3+
###############################
4+
5+
****************
6+
Basic Commands
7+
****************
8+
9+
.. _flwr-datasets-apiref:
10+
11+
``flwr-datasets`` CLI
12+
======================
13+
14+
.. click:: flwr_datasets.cli.app:typer_click_object
15+
:prog: flwr-datasets create
16+
:nested: full

datasets/docs/source/reference.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
###########
2+
Reference
3+
###########
4+
5+
************
6+
References
7+
************
8+
9+
.. autosummary::
10+
:toctree: ref-api
11+
:template: autosummary/module.rst
12+
:caption: API reference
13+
:recursive:
14+
15+
flwr_datasets
16+
17+
.. toctree::
18+
:maxdepth: 2
19+
20+
ref-api-cli
21+

0 commit comments

Comments
 (0)