I'm observing a significant slowdown when parallelizing UltraNest across multiple nodes. I conducted a scaling study using a toy problem with UltraNest. While I observed speedup for intra-node parallelization, performance dropped notably when using more than one node. For example, with two nodes (256 cores), UltraNest took > 1.5h to complete, compared to just 14 minutes when using a single node (128 cores). I wonder if this is due to an error in my parallelization implementation or an internal issue within UltraNest.
On the cluster, I have this PBS code to schedule the job for 2 node parallelization
#PBS -l select=2:mpiprocs=256
#PBS -l walltime=20:00:00
.......
module load gcc/13.2.0
module load mpich/4.2.0-gcc-13.2.0
module load anaconda3/2024.02
conda activate espei_un
mpiexec -np 256 python UN_toy.py
import numpy as np
import scipy
from numpy import sin, pi
import ultranest
import ultranest.stepsampler
import matplotlib.pyplot as plt
def multi_sin(x, params):
sum = 0
n = len(params)//3
for i in range(n):
A, B, C = params[i*3:i*3+3]
sum += A * sin(2*pi* (x/B + C))
sum += params[-1]
return sum
def transform_to_prior(cube):
params = cube.copy()
for i in range(len(params)):
params[i] = norm_list[i].ppf(params[i])
return params
def log_lieklihood(params):
y_guess = multi_sin(t,params)
return -0.5 * np.sum((y - y_guess)**2)
paramnames = ['param1', 'param2', 'param3','param4','param5','param6','param7','param8','param9','param10','param11',
'param12', 'param13', 'param14', 'param15', 'param16', 'param17', 'param18', 'param19', 'param20', 'param21',
'param22']
rng = np.random.default_rng(seed=42)
initial_value = [4.2, 3, 0,
1.2, 1.2, 1.2,
0.9, 1.8, 1.1,
1.4, 1.5, 1.6,
2.5, 1.2, 1.5,
3.4, 1.8, 1.4,
4.2, 1.5, 1.6]
initial_value.append(1.0)
norm_list= []
t = rng.uniform(0, 5, size=60)
y = rng.normal(multi_sin(t, initial_value), 1)
for item in initial_value:
norm_list.append(scipy.stats.norm(loc = item, scale = 1))
sampler = ultranest.ReactiveNestedSampler(
paramnames,
log_lieklihood,
transform_to_prior,
log_dir="scalibility_UN_toy_22para_step44_2node",
)
nsteps = 44#len(paramnames)
sampler.stepsampler = ultranest.stepsampler.SliceSampler(
nsteps=nsteps,
generate_direction=ultranest.stepsampler.generate_mixture_random_direction,
)
result = sampler.run(min_num_live_points=400)
sampler.print_results()
sampler.plot()
Description
I'm observing a significant slowdown when parallelizing UltraNest across multiple nodes. I conducted a scaling study using a toy problem with UltraNest. While I observed speedup for intra-node parallelization, performance dropped notably when using more than one node. For example, with two nodes (256 cores), UltraNest took > 1.5h to complete, compared to just 14 minutes when using a single node (128 cores). I wonder if this is due to an error in my parallelization implementation or an internal issue within UltraNest.
What I Did
On the cluster, I have this PBS code to schedule the job for 2 node parallelization
The
UN_toy.pyscript contains a simple sinusoidal regression, similar to the example demonstrated at https://johannesbuchner.github.io/UltraNest/example-sine-highd.html, but extended to a 22-dimensional problem.This is the actual code file