Comment:
I have a Lenovo ThinkStation 13th Gen Intel(R) Core(TM) i9-13900 (24 cores) with 32Gb RAM running Ubuntu 22.04.5 LTS. The computer name is "newton" in the messages below.
Brief summary: I can run the SIESTA code in serial mode only. When I try to use MPI to make it run parallel (with 2 to 24 processors) memory usage will grow until I get an out of memory error. I desperately want to run parallel because, well obviously, it will be many times faster. I am modeling a quartz/polyethylene interface with 756 atoms, so molecular dynamics or CG energy minimization can take weeks in serial, but mere days if I can run parallel.
More details: I installed Siesta using Conda and included MPI parallelization support:
conda install -c conda-forge "siesta=*=*openmpi*"
This was done on a literally out-of-the-box new machine, so The requisite libraries, etc. were also added in via conda (within the SIESTA install...I did not install them separately). When I run the following command within the siesta environment created by conda:
(siesta) derrick@newton:~/Siesta_test_calcs/small_sirich2_finish_serial$ mpirun -np 24 siesta < quartz_2x1x2_orichhydCG.fdf > orich_small_CGout_Newton_par24
The job will eventually terminate with an error like:
prterun noticed that process rank 22 with PID 4539 on node newton exited on
signal 9 (Killed).
The rank and PID change when I try runs with different numbers of processors (e.g. -np 2) but the end is always the same. If I watch the system monitor the memory usage just keeps expanding until it exceeds the machine's limits. In serial, this test run of a smaller system uses only about 2Gb of RAM and runs to completion with no errors. For example:
siesta < quartz_2x1x2_orichhydCG.fdf > orich_small_CGout_Newton_serial
I'm quite familiar with running SIESTA and see this behavior no matter what type or size simulation I am running. But if you think the fdf files or others may help you understand the situation, I'll provide them.
I was hoping there were some flags I could use while issuing the mpirun command or perhaps re-install with different flags to avoid this issue. Maybe it's quirk of my particular processor or computer architecture?
Thanks for your consideration and let me know if there is any more data you might need to help me answer the question.
Derrick
Comment:
I have a Lenovo ThinkStation 13th Gen Intel(R) Core(TM) i9-13900 (24 cores) with 32Gb RAM running Ubuntu 22.04.5 LTS. The computer name is "newton" in the messages below.
Brief summary: I can run the SIESTA code in serial mode only. When I try to use MPI to make it run parallel (with 2 to 24 processors) memory usage will grow until I get an out of memory error. I desperately want to run parallel because, well obviously, it will be many times faster. I am modeling a quartz/polyethylene interface with 756 atoms, so molecular dynamics or CG energy minimization can take weeks in serial, but mere days if I can run parallel.
More details: I installed Siesta using Conda and included MPI parallelization support:
conda install -c conda-forge "siesta=*=*openmpi*"This was done on a literally out-of-the-box new machine, so The requisite libraries, etc. were also added in via conda (within the SIESTA install...I did not install them separately). When I run the following command within the siesta environment created by conda:
(siesta) derrick@newton:~/Siesta_test_calcs/small_sirich2_finish_serial$ mpirun -np 24 siesta < quartz_2x1x2_orichhydCG.fdf > orich_small_CGout_Newton_par24The job will eventually terminate with an error like:
The rank and PID change when I try runs with different numbers of processors (e.g. -np 2) but the end is always the same. If I watch the system monitor the memory usage just keeps expanding until it exceeds the machine's limits. In serial, this test run of a smaller system uses only about 2Gb of RAM and runs to completion with no errors. For example:
siesta < quartz_2x1x2_orichhydCG.fdf > orich_small_CGout_Newton_serialI'm quite familiar with running SIESTA and see this behavior no matter what type or size simulation I am running. But if you think the fdf files or others may help you understand the situation, I'll provide them.
I was hoping there were some flags I could use while issuing the mpirun command or perhaps re-install with different flags to avoid this issue. Maybe it's quirk of my particular processor or computer architecture?
Thanks for your consideration and let me know if there is any more data you might need to help me answer the question.
Derrick