-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Aim
Turn ParallelAnalysisBase into a custom dask collection (https://docs.dask.org/en/latest/custom-collections.html).
Current syntax
u = mda.Universe(TPR, XTC)
ow = u.select_atoms("name OW")
D = pmda.density.DensityAnalysis(ow, delta=1.0)
# Option one (
D.run(n_blocks=2, n_jobs=2)
# Option three
D.prepare_jobs(n_blocks=2)
D.compute(n_jobs=2) # or dask.compute(D)
# furthermore
dask.compute(D_1, D_2, D_3, D_4...) # D_x as an individual analysis job.Implementatation
class ParallelAnalysisBase(DaskMethodsMixin)- The
self.prepare_jobswill works as the first half of the oldself.run(), i.e. create a dask graph. The difference isParallelAnalysisBasealso stores the graph (and the keys) itself. - The
self.runwill check if the jobs are prepared and run the jobs. - As a dask custom collection,
self.compute()or dask.compute(ParallelAnalysisBase) will- first generate a list of results (as saved in self._keys) from self._dsk (dask graph)
- run self.dask_postpersist(), which concludes and rebuilds the results (self._post_reduce), a.k.a the second half of the old
self.run.
Advantage
- The possibility to run multiple analysis at the same time. It is useful when e.g. we have dozens of short simulations that can only utilize one core each.
- can visualize the dask graph with self.visualize()
- An possible API to extend to complex analysis. (build complex dask graph)
Benchmark
TODO
Illustration
TODO
Metadata
Metadata
Assignees
Labels
No labels