-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Expected behaviour
Analysis of a Universe with on-the-fly transformation scales good (reasonable).
Actual behaviour
The scaling performance is really bad even with two cores.
Code
import MDAnalysis as mda
from MDAnalysis import transformations as trans
from pmda.rms.rmsd import RMSD as parallel_rmsd
u = mda.Universe(files['PDB'], files['LONG_TRAJ']) # 9000 frames
fit_trans = trans.fit_rot_trans(u.atoms, u.atoms)
u.trajectory.add_transformations(fit_trans)
n_jobs = [1, 2, 4, 8, 16, 32, 64]
rmsd = parallel_rmsd(u.atoms, u.atoms)
rmsd.run(n_blocks=nj,
n_jobs=nj) # timeitReason
In some Transformations includes numpy.dot which itself is multi-threaded. So the cores are oversubscribed.
Possible solution
- define NUM_THREADS=1 for
numpy(https://docs.dask.org/en/latest/array-best-practices.html#avoid-oversubscribing-threads). which is surprisingly faster even for serial (single-core) performance. - use
cupy(https://cupy.dev/) to leverage the GPU power. (only replacing thenumpy.dotoperation of theTransformation)
Benchmarking result
- Benchmarking system:
- AMD EPYC 7551 32-Core Processor
- RTX 2080 Ti
- cephfs file system
Currently version of MDAnalysis:
(run python -c "import MDAnalysis as mda; print(mda.__version__)") 2.0.0 dev
(run python -c "import pmda; print(pmda.__version__)")
(run python -c "import dask; print(dask.__version__)")
Metadata
Metadata
Assignees
Labels
No labels

