@@ -24,6 +24,7 @@ Currently, yt is able to perform the following actions in parallel:
2424* Halo analysis (:ref: `halo-analysis `)
2525* Volume rendering (:ref: `volume_rendering `)
2626* Isocontours & flux calculations (:ref: `extracting-isocontour-information `)
27+ * Parallelization over data containers (:ref: `_Data-objects `)
2728
2829This list covers just about every action yt can take! Additionally, almost all
2930scripts will benefit from parallelization with minimal modification. The goal
@@ -219,6 +220,7 @@ The following operations use chunk decomposition:
219220* Derived Quantities (see :ref: `derived-quantities `)
220221* 1-, 2-, and 3-D profiles (see :ref: `generating-profiles-and-histograms `)
221222* Isocontours & flux calculations (see :ref: `surfaces `)
223+ * Parallelization over data containers using ``piter `` (see :ref: `_Data-objects `)
222224
223225Parallelization over Multiple Objects and Datasets
224226++++++++++++++++++++++++++++++++++++++++++++++++++
@@ -229,8 +231,81 @@ independently to several different objects or datasets, a so-called
229231task, yt can do that easily. See the sections below on
230232:ref: `parallelizing-your-analysis ` and :ref: `parallel-time-series-analysis `.
231233
232- Use of ``piter() ``
233- ^^^^^^^^^^^^^^^^^^
234+ Use of ``piter() `` on a data container
235+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
236+
237+ You can parallelize I/O on any data container using the
238+ :func: `~yt.data_objects.selection_objects.data_selection_objects.YTSelectionContainer.piter ` function.
239+ It dispatches all the chunks that make up the data container to different processors, and gives you the ability to store results from each chunk.
240+
241+ Take the following example
242+
243+ .. code-block :: python
244+
245+ import yt
246+
247+ yt.enable_parallelism()
248+
249+ ds = yt.load(" output_00080" )
250+ ad = ds.all_data()
251+
252+ # Each processor reads a different chunk
253+ my_storage = {}
254+ for sto, chunk in ad.piter(storage = my_storage):
255+ sto.result = {}
256+ sto.result[" gas" , " density" ] = chunk[" gas" , " density" ]
257+
258+ my_storage # now contains one entry per chunk
259+ my_storage[0 ][" gas" , " density" ] # contains the density for chunk 0
260+ ...
261+ my_storage[15 ][" gas" , " density" ] # contains the density for chunk 15 (out of 15)
262+
263+
264+ Sometimes, it is also useful to combine the results. This is implemented using an optional ``reduction `` keyword argument to the
265+ :func: `~yt.data_objects.selection_objects.data_selection_objects.YTSelectionContainer.piter ` function.
266+ The following reductions are implemented:
267+ - ``None `` (default): no reduction, ``my_storage `` will contain one entry per chunk
268+ - ``"cat" ``: concatenate the results from all chunks, ``my_storage `` will contain a flattened list of all results
269+ - ``"cat_on_root" ``: concatenate the results from all chunks, but only on the root process. The other processes will have ``None `` in ``my_storage ``
270+ - ``"sum" ``, ``"min" `` or ``"max" ``: reduce the results from all chunks using the specified operation, ``my_storage `` will contain the reduced result on all processes
271+
272+ .. code-block :: python
273+
274+ # Gathering everything on root
275+ my_storage = {}
276+ for sto, chunk in ad.piter(storage = my_storage, reduction = " cat_on_root" ):
277+ sto.result = {}
278+ sto.result[" gas" , " density" ] = chunk[" gas" , " density" ]
279+
280+ # On root:
281+ if yt.is_root():
282+ my_storage[" gas" , " density" ] == ad[" gas" , " density" ] # True for all entries
283+ else :
284+ my_storage[" gas" , " density" ] is None
285+
286+ # Gather everything on all processes
287+ my_storage = {}
288+ for sto, chunk in ad.piter(storage = my_storage, reduction = " cat" ):
289+ sto.result = {}
290+ sto.result[" gas" , " density" ] = chunk[" gas" , " density" ]
291+
292+ my_storage[" gas" , " density" ] == ad[
293+ " gas" , " density"
294+ ] # True for all entries on all processes
295+
296+ # Summing over all chunks
297+ my_storage = {}
298+ for sto, chunk in ad.piter(storage = my_storage, reduction = " sum" ):
299+ sto.result = {}
300+ sto.result[" gas" , " total_mass" ] = chunk[" gas" , " cell_mass" ].sum()
301+ sto.result[" gas" , " total_volume" ] = chunk[" gas" , " cell_volume" ].sum()
302+
303+ my_storage[" gas" , " total_mass" ] # contains the total mass
304+ my_storage[" gas" , " total_volume" ] # contains the total volume
305+
306+
307+ Use of ``piter() `` over objects and datasets
308+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
234309
235310If you use parallelism over objects or datasets, you will encounter
236311the :func: `~yt.data_objects.time_series.DatasetSeries.piter ` function.
0 commit comments