Description
Hi,
This is a issue very specific to experiments for my thesis/application:
As I want to perform a lot of individual solves for my thesis, I run multiple julia instances (each using multiple workers and threads) on a pretty beefy machine. This seems to be faster than running them sequentially, because parallelism can't be exploited everywhere. It apparently worked fine until the RoME v0.13 release (I think). Now the solver just "hangs up" in all but one instance (this happens at a different pose for every time I try).
This is the last thing that gets written to stdout (aka my logs ;)):
Solve Progress: approx max 1752, at iter 711 Time: 0:01:50�[K[ Info: CSM-5 Clique 47 finished
Solve Progress: approx max 1752, at iter 716 Time: 0:01:51�[K
I run the multiple julia instances each in their own screen
session like
screen -S <sessionname>
And then fire up my my evaluation script that reads my dataset from a matfile and takes some other arguments (command gets auto-generated by my matlab frontend)
julia -t auto -p 8 --project=Masterarbeit/svn/julia -J ~/.julia/sysimage_RoME.so -- Masterarbeit/svn/julia/mmiSAM/evaluation/solve2DIncremental.jl --path tmp --file sigmaTrajTf-0-001-0-000-0-005_sigmaLmBearingAndRanging-0-004-0-001_wrongRatio-1-000_minDist-0-200_maxDist-2-000.mat --trajKeys tf --lmKeys bearingAndRanging --startPoseIdx 1 --endPoseIdx 150 --startPoseVal "[7.15793412168699;3.38794983135837;-1.36840501375598]" --plotSaveFinal 1 --plotSaveIter 1 --nRuns 1 --suffix variables --useMsgLikelihoods 0 --nullhypo 0.000000 --tukey 15.000000 --nKernels 100 --spreadNH 3.000000 --inflation 3.000000
I will resort to running things sequentally (as any online application would), but especcialy for tuning out parameters like spreadNH
or inflation
running things in parallel was very useful, as I don't have access to an infinite amount of machines ;) As multiprocess/multithread performance increased in the last releases, I suspect that something (maybe not even directly RoME-related, but in Julia general) gets in each others way, some lock does not get released.
Best,
Leo