Skip to content

Reduce initialization cost of usc_init library constructor #11211

@fsimonis

Description

@fsimonis

The library constructor usc_init is problematic as it adds a fixed startup cost / delay to all executables that transitively link to libusc.

I suggest implementing a lazy initialization strategy with initializing on first use to avoid the cost of the library constructor.

This is observable with version 1.19.0 (Ubuntu 25.10 questing) and to my understanding, this problem is getting even worse with #11112.

Motivation:

I have a test executable that transitively depends on libucs.so via OpenMPI, without using any related functionality.
We run our 1k (increasing) tests in isolation using ctest, calling the executable at least once per test as some are run using mpirun.

I measure an overhead of 42ms per invocation, so we are dealing with a lower bound of 42 seconds of usc_init for each of our 10+ test configurations.

MRE

I measure the invocation overhead by linking the library to a do-nothing main as follows:

$ echo 'int main() {}' > nothing.c
$ gcc nothing.c -c -o nothing.o
$ gcc -Wl,--no-as-needed -lucs nothing.o -o nothing
$ ldd nothing
        linux-vdso.so.1 (0x00007cc9738de000)
        libucs.so.0 => /lib/x86_64-linux-gnu/libucs.so.0 (0x00007cc973831000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007cc973400000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007cc973724000)
        libucm.so.0 => /lib/x86_64-linux-gnu/libucm.so.0 (0x00007cc973705000)
        /lib64/ld-linux-x86-64.so.2 (0x00007cc9738e0000)
$ hyperfine -w 1 ./nothing
Benchmark 1: ./nothing
  Time (mean ± σ):      42.2 ms ±   0.2 ms    [User: 20.4 ms, System: 21.8 ms]
  Range (min … max):    41.9 ms …  42.9 ms    69 runs

Sanity check with perf record -Fmax -g ./nothing:

Image

The perf records suggest that the time is spent in:

deadline = ucm_get_time() + ucm_global_opts.bistro_grace_duration;
while (ucm_get_time() < deadline) {
sched_yield();
}

However, the deadline mentions a timeout of 5ms in 1.19, which doesn't match up with the measured 42ms as sched_yield() covers 79% of samples in the perf trace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions