Skip to content

Turn ParallelAnalysisBase into dask custom collection #135

@yuxuanzhuang

Description

@yuxuanzhuang

Aim

Turn ParallelAnalysisBase into a custom dask collection (https://docs.dask.org/en/latest/custom-collections.html).

Current syntax

u = mda.Universe(TPR, XTC)
ow = u.select_atoms("name OW")
D = pmda.density.DensityAnalysis(ow, delta=1.0)

# Option one (
D.run(n_blocks=2, n_jobs=2)

#  Option three
D.prepare_jobs(n_blocks=2)
D.compute(n_jobs=2)   #  or dask.compute(D)

#  furthermore
dask.compute(D_1, D_2, D_3, D_4...)  #  D_x as an individual analysis job.

Implementatation

  • class ParallelAnalysisBase(DaskMethodsMixin)
  • The self.prepare_jobs will works as the first half of the old self.run(), i.e. create a dask graph. The difference is ParallelAnalysisBase also stores the graph (and the keys) itself.
  • The self.run will check if the jobs are prepared and run the jobs.
  • As a dask custom collection, self.compute() or dask.compute(ParallelAnalysisBase) will
    • first generate a list of results (as saved in self._keys) from self._dsk (dask graph)
    • run self.dask_postpersist(), which concludes and rebuilds the results (self._post_reduce), a.k.a the second half of the old self.run.

Advantage

  • The possibility to run multiple analysis at the same time. It is useful when e.g. we have dozens of short simulations that can only utilize one core each.
  • can visualize the dask graph with self.visualize()
  • An possible API to extend to complex analysis. (build complex dask graph)

Benchmark

TODO

Illustration

TODO

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions