Description
BenchExec currently supports machines with several CPUs, CPUs with several NUMA nodes, NUMA nodes with several physical cores, and physical cores with several virtual cores. However, modern large CPUs add another layer: Certain cores of a NUMA node can share L3 cache, but have a separate L3 cache from other cores on the same NUMA node.
Example architectures:
- http://developer.amd.com/wp-content/resources/56420.pdf (page 15)
- https://nas.nasa.gov/assets/nas/pdf/ams/2021/AMS_20210720_Hogan-ONeill.pdf (slides 11 and 17)
For performance and determinism, it is better if on such machines, core allocation takes the cache architecture into account, such that cores that share cache are assigned preferably to a run.
We can read this information from /sys/devices/system/cpu/cpuX/topology/die_cpus_list
(docs), which is next to the files we use for retrieving the other topology information.
We should cross-check with hwloc-ls
and hwloc-info
that we get it right.