Skip to content

Commit a96d337

Browse files
authored
Add Dask engine to dataset generation functions (#404)
1 parent 0322ed9 commit a96d337

File tree

7 files changed

+298
-76
lines changed

7 files changed

+298
-76
lines changed

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,7 @@
199199
intersphinx_mapping = {
200200
"python": ("https://docs.python.org/3.8", None),
201201
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
202+
"dask": ("https://docs.dask.org/en/stable/", None),
202203
"tables": ("https://www.pytables.org/", None),
203204
"numpy": ("https://numpy.org/doc/stable/", None),
204205
"networkx": ("https://networkx.org/documentation/stable/", None),

docs/source/simulated_populations/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,3 +137,11 @@ or United States), unzip the contents to the desired location on your computer.
137137
Once you've unzipped the simulated population data, you can pass the directory
138138
path to the :code:`source` parameter of the :ref:`dataset generation functions
139139
<dataset_generation_functions>` to generate large-scale datasets!
140+
141+
If you're using one of the larger populations, you'll also want to take a look at the
142+
:code:`engine` parameter.
143+
By default, pseudopeople generates datasets using Pandas, which does not fully parallelize
144+
across cores and requires the entire dataset to fit into RAM.
145+
However, by passing "dask" to the :code:`engine` parameter, you can run the dataset
146+
generation on a Dask cluster, which can spill data to disk and even distribute
147+
the computation across multiple computers!

setup.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,12 @@
5151
"jupyter",
5252
]
5353

54+
dask_requirements = ["dask"]
55+
5456
test_requirements = [
5557
"pytest",
5658
"pytest-mock",
57-
]
59+
] + dask_requirements
5860

5961
lint_requirements = [
6062
"black==22.3.0",
@@ -109,6 +111,7 @@
109111
+ test_requirements
110112
+ interactive_requirements
111113
+ lint_requirements,
114+
"dask": dask_requirements,
112115
},
113116
# entry_points="""
114117
# [console_scripts]

0 commit comments

Comments
 (0)