Skip to content

RedistributeCPU is Very Slow & Tiling Discussion #4893

@ax3l

Description

@ax3l

RedistributeCPU, even outside of optimization bugs like #4892, is generally very slow, usually one of the TOP3 functions on CPU in WarpX and ImpactX. It is a sorting function.

I think we should investigate the following optimizations:

Generally

  • Can this function use better memory access patterns?
  • Can this function benefit from vectorization?

Single Node, Single Thread

Single Node, Multiple threads

  • Can this function use a special pass for single-MPI ranks?
  • Can this function benefit for single-MPI ranks to only redistribute between OpenMP tiles on

Tiling in General

Is (spatially distributed) tiling on CPUs really the best approach to use OpenMP threads for AMReX? Has using spatially overlapping parallelization been tried, e.g., all particles are in the same spatial box and just tiled by index, then deposition and gather buffers are of the same size of the box and aggregated after deposition, etc.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions