Description
I can't find this in older issues, but: shouldn't all of the recon-all call include the new-parallel
switch prior to -openmp
? I dont' see that being used in the log files, and this might speed up things further...
https://surfer.nmr.mgh.harvard.edu/fswiki/ReleaseNotes
Parallelization: a new flag was introduced which enables two forms of compute parallelization that significantly reduces the runtime. As a point of reference, using a new-ish workstation (2015+), the recon-all -all runtime is just under 3 hours. When the -parallel flag is specified at the end of the recon-all command-line, it will enable 'fine-grained' parallelized code, making use of OpenMP, embedded in many of the binaries, namely affecting mri_em_register and mri_ca_register. By default, it instructs the binaries to use 4 processors (cores), meaning, 4 threads will run in parallel in some operations (manifested in 'top' by mri_ca_register, for example, showing 400% CPU utilization). This can be overridden by including the flag -openmp after -parallel, where is the number of processors you'd like to use (ex. 8 if you have an 8 core machine). Note that this parallelization was introduced in v5.3, but many new routines were OpenMP-parallelized in v6. The other form of parallelization, a 'coarse' form, enabled when the -parallel flag is specified, is such that during the stages where left and right hemispheric data is processed, each hemi binary is run separately (and in parallel, manifesting itself in 'top' as two instances of mris_sphere, for example). Note that a couple of the hemi stages (eg. mris_sphere) make use of a tiny amount of OpenMP code, which means that for brief periods, as many as 8 cores are utilized (2 binaries running code that each make use of 4 threads). In general, though, a 4 core machine can easily handle those periods. Be aware that if you enable this -parallel flag on instances of recon-all running through a job scheduler (like a cluster), it may not make your System Administrator happy if you do not pre-allocate a sufficient number of cores for your job, as you will be taking cycles from other cores that may be running jobs belonging to other cluster users.