Skip to content

Conversation

@ax3l
Copy link
Member

@ax3l ax3l commented Nov 25, 2025

A container for WarpX on GPU on Perlmutter without MPI support.

Used as base image for SYNAPSE for now BLAST-AI-ML/synapse#279

To Do

cd ~/src/warpx/Examples/Physics_applications/laser_acceleration

# executable
podman-hpc run --rm --gpu -v $PWD:/opt/pwd -it registry.nersc.gov/m558/superfacility/warpx-perlmutter-nompi:25.11 warpx.rz /opt/pwd/inputs_base_rz

# Python
podman-hpc run --rm --gpu -v $PWD:/opt/pwd -it registry.nersc.gov/m558/superfacility/warpx-perlmutter-nompi:25.11 /opt/pwd/inputs_test_rz_laser_acceleration_picmi.py
  • Publish as registry.nersc.gov/m558/superfacility/warpx-perlmutter-nompi:25.11

@ax3l ax3l requested a review from RemiLehe November 25, 2025 22:18
@ax3l ax3l added backend: cuda Specific to CUDA execution (GPUs) component: documentation Docs, readme and manual install machine / system Machine or system-specific issue labels Nov 25, 2025
@ax3l ax3l force-pushed the doc-pm-container-nompi branch from bb7aaab to f79db47 Compare November 26, 2025 07:49
@ax3l ax3l changed the title [WIP] Doc: WarpX no-MPI Perlmutter Container Doc: WarpX no-MPI Perlmutter Container Nov 27, 2025
@ax3l ax3l force-pushed the doc-pm-container-nompi branch from 3774e65 to 8673cc3 Compare November 27, 2025 01:33
@ax3l ax3l marked this pull request as ready for review November 27, 2025 01:33
@ax3l ax3l force-pushed the doc-pm-container-nompi branch from 8673cc3 to 866cc24 Compare November 27, 2025 01:33
@ax3l ax3l requested a review from EZoni November 27, 2025 01:35
A container for WarpX on GPU on Perlmutter without
MPI support.

Add HDF5.
@ax3l ax3l force-pushed the doc-pm-container-nompi branch from b5ea900 to fc341b7 Compare November 29, 2025 04:39
@EZoni
Copy link
Member

EZoni commented Dec 3, 2025

Amazing, thanks for this PR (and for #6389, which I had missed).

Here's the diff between the MPI Containerfile and the no-MPI Containerfile:

2c2
< #   podman build --build-arg NJOBS=6 -t warpx-perlmutter .
---
> #   podman build --build-arg NJOBS=6 -t warpx-perlmutter-nompi .
5c5
< # WarpX executables are installed in /usr/bin/
---
> # WarpX executables are installed in /opt/warpx/bin
9c9
< #   podman-hpc run --rm --gpu --cuda-mpi --nccl -v $PWD:/opt/pwd -it registry.nersc.gov/m558/superfacility/warpx-perlmutter:latest warpx.2d /opt/pwd/inputs_base_2d
---
> #   podman-hpc run --rm --gpu -v $PWD:/opt/pwd -it registry.nersc.gov/m558/superfacility/warpx-perlmutter-nompi:latest warpx.2d /opt/pwd/inputs_base_2d
13c13
< #     podman run ... ${INPUTS} ${GPU_AWARE_MPI}" \
---
> #     podman run ... ${INPUTS}" \
52,53d51
< # Perlmutter MPICH: cray-mpich/8.1.30 based on MPICH 3.4a2
< # Ubuntu 22.04 ships MPICH 4.0, so we build from source
73a72
>         libhdf5-dev             \
81,109d79
< # Install MPICH 3.4 support from source
< #   after https://docs.nersc.gov/development/containers/shifter/how-to-use/#using-mpi-in-shifter
< # Note:
< #   w/o GPUdirect here, because we will swap out the MPI binaries on Perlmutter using a Podman-HPC
< #   plugin (--mpi/--cuda-mpi). This only works if we do NOT build GPUdirect here, this we skip:
< #     --with-ch4-shmmods=posix,gpudirect
< # Perlmutter MPICH: cray-mpich/8.1.30 based on MPICH 3.4a2
< # TODO: install libfabric from source and depend on it (or expose where embedded libfabric gets installed)
< ARG mpich=3.4.3
< ARG mpich_prefix=mpich-$mpich
< 
< RUN \
<     curl -Lo $mpich_prefix.tar.gz https://www.mpich.org/static/downloads/$mpich/$mpich_prefix.tar.gz && \
<     tar xzf $mpich_prefix.tar.gz                                            && \
<     cd $mpich_prefix                                                        && \
<     FFLAGS=-fallow-argument-mismatch FCFLAGS=-fallow-argument-mismatch         \
<       ./configure                                                              \
<         --disable-fortran                                                      \
<         --prefix=/opt/warpx                                                    \
<         --with-device=ch4:ofi                                               && \
<     make -j ${NJOBS}                                                        && \
<     make install                                                            && \
<     make clean                                                              && \
<     cd ..                                                                   && \
<     rm -rf ${mpich_prefix}*
< 
< RUN /sbin/ldconfig
< # ENV MPICH_GPU_SUPPORT_ENABLED=1
< 
131d100
< # TODO: install libfabric from source and depend on it (or expose where MPICH-embedded libfabric gets installed)
145a115
>           -DADIOS2_USE_MPI=OFF     \
212a183
>         openpmd-viewer   \
244c215
<         -DWarpX_DIMS="2;RZ"         \
---
>         -DWarpX_DIMS="1;2;RZ;3"     \
245a217
>         -DWarpX_MPI=OFF             \
257a230
>     ln -s /opt/warpx/bin/warpx.1d* /opt/warpx/bin/warpx.1d  && \
259c232,233
<     ln -s /opt/warpx/bin/warpx.rz* /opt/warpx/bin/warpx.rz
---
>     ln -s /opt/warpx/bin/warpx.rz* /opt/warpx/bin/warpx.rz  && \
>     ln -s /opt/warpx/bin/warpx.3d* /opt/warpx/bin/warpx.3d
275a250
>         libhdf5-103-1       \

I was wondering if we could/should smoothen some of the differences, e.g.:

  • -DWarpX_DIMS="2;RZ" in the MPI Containerfile v. -DWarpX_DIMS="1;2;RZ;3" in the no-MPI Containerfile: Why don't we build all dims in both cases?
  • openpmd-viewer is only installed in the no-MPI Containerfile: Why isn't it installed in the MPI case as well?
  • # WarpX executables are installed in /usr/bin/ in the MPI Containerfile v. # WarpX executables are installed in /opt/warpx/bin in the no-MPI Containerfile: Do we install in different paths or is one of the comments simply outdated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend: cuda Specific to CUDA execution (GPUs) component: documentation Docs, readme and manual install machine / system Machine or system-specific issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants