Skip to content

Build performance does not scale to many cores/threads #6432

Open
@krasznaa

Description

@krasznaa

Explain what you would like to see improved

I know that this is a very first world problem, but it has been bugging me since a while. The build of ROOT using its CMake setup is not scaling well to many core systems at all. 😦

This is a snapshot of how ROOT 6.20/08 used my system's resources during its build:

root-6 20 08-build

The build starts "pretty much" at the left hand side of the timeline, and lasts until "pretty much" the right hand side of it.

As you can see, the build starts out very well. Building LLVM scales perfectly to 64 threads. And I believe it would scale well to even beyond that. But once the LLVM build is done, many bottlenecks show up. First there is a big bottleneck with building libCling and rootcling, but after that the build of libRIO is also taking a surprising amount of time. And the build is stuck waiting for all of these.

Towards the end things improve a bit once more, as many libraries / source files can build in parallel once more. But even then, very rarely does the build manage to make use of all of the available cores.

Optional: share how it could be improved

From a quick glance it seems that ROOT's CMake configuration sets up way too many unnecessary dependencies between its build targets. Most of the issues seem to arise from how the dictionary generation is set up as far as I can see.

In ATLAS I use the following code to set up the generation of dictionary source files:

https://gitlab.cern.ch/atlas/atlasexternals/-/blob/master/Build/AtlasCMake/modules/AtlasDictionaryFunctions.cmake

And that provides a much better behaviour. Mainly because in ATLAS's setup dictionary generations do not need to wait for anything. Even if the library that a dictionary is being produced for depends on a number of upstream libraries, the dictionary for that library can be generated before all the upstream libraries would have finished building. In practice this actually means that the start of any ATLAS software build is dominated by running dictionary generation. As GNU Make and Ninja both prefer running those build steps first. (As they do not have any dependencies themselves.)

The reason I blame the dictionary generation code is that regular C(++) code building with Ninja scales very well to many cores. Even when one has many small libraries in a project, Ninja can start the build of object files before all of the libraries that they depend on would've finished building. (In ATLAS's offline software the very end of a build is taken up purely by library/executable linking steps.)

To Reproduce

Unfortunately you need a pretty powerful machine to do so... But once you do, just do something similar to what I did:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=17 \
   -Dall=ON -Dbuiltin_gsl=ON -Dbuiltin_freetype=ON -Dbuiltin_lzma=ON -Dbuiltin_veccore=ON \
   -DXROOTD_ROOT_DIR=~/software/xrootd/4.12.2/x86_64-ubuntu2004-gcc9-opt \
   -DTBB_ROOT_DIR=~/software/oneTBB/2020.2/x86_64-ubuntu2004-gcc9-opt \
   -DCMAKE_INSTALL_PREFIX=~/software/root/6.20.08/x86_64-ubuntu2004-gcc9-opt ../root-6.20.08/
ninja

Setup

As mentioned earlier, I used ROOT 6.20/08 for this particular test. But the behaviour has been like this since forever. I performed the build on Ubuntu 20.04 with GCC 9, but that should make little difference to the overall behaviour.

Additional context

N/A

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions