RedistributeCPU is Very Slow & Tiling Discussion

`RedistributeCPU`, even outside of optimization bugs like #4892, is generally very slow, usually one of the TOP3 functions on CPU in WarpX and ImpactX. It is a sorting function.

I think we should investigate the following optimizations:

## Generally

* Can this function use better memory access patterns?
* Can this function benefit from vectorization?

## Single Node, Single Thread

* This should be a no-OP, see #4892

## Single Node, Multiple threads

* Can this function use a special pass for single-MPI ranks?
* Can this function benefit for single-MPI ranks to only redistribute between OpenMP tiles on

# Tiling in General

Is (spatially distributed) tiling on CPUs really the best approach to use OpenMP threads for AMReX? Has using spatially overlapping parallelization been tried, e.g., all particles are in the same spatial box and just tiled by index, then deposition and gather buffers are of the same size of the box and aggregated after deposition, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RedistributeCPU is Very Slow & Tiling Discussion #4893

Generally

Single Node, Single Thread

Single Node, Multiple threads

Tiling in General

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RedistributeCPU is Very Slow & Tiling Discussion #4893

Description

Generally

Single Node, Single Thread

Single Node, Multiple threads

Tiling in General

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions