Skip to content

Commit 0f5b6c5

Browse files
committed
M9.R.37.4+5: link nix-glibc ld.so.cache to /etc + narrow LD_LIBRARY_PATH
M9.R.37.4 — symlink nix-store glibc's etc/ld.so.cache to /etc/ld.so.cache The from-source binaries on the live ISO carry PT_INTERPs like /nix/store/xx7cm72...-glibc-2.40-66/lib/ld-linux-x86-64.so.2. That ld-linux is COMPILED with its ld.so.cache path baked in as /nix/store/xx7cm72...-glibc-2.40-66/etc/ld.so.cache (`strings` it to confirm). Our M9.R.37.3 ldconfig populates the Debian system cache at /etc/ld.so.cache, but the nix-store loader never looks there — it opens the cache at its own compiled-in path, gets ENOENT, and skips ld.so.cache entirely during dlopen fall-through. Concretely: mkfs.ext4's loader walked LD_LIBRARY_PATH (no libext2fs.so.2), DT_RUNPATH (no libext2fs.so.2 either — the recipe's RPATH-patcher missed its own sister-lib dir), then tried to open ld.so.cache at the nix-store glibc path, got ENOENT, gave up, and exited 127 with "libext2fs.so.2: cannot open shared object file". Fix: for each nix-store glibc dir on the stage, chmod u+w its etc/ subdir then drop a symlink at etc/ld.so.cache -> /etc/ld.so.cache. Every loader — Debian or nix — now consults the same cache the system ldconfig wrote, and libext2fs.so.2 (cached at the absolute e2fsprogs install-mirror path) resolves cleanly. M9.R.37.5 — narrow LD_LIBRARY_PATH from ~600 to ~5 entries The launcher's wholesale /nix/store/*/lib walk put ~600 directories on LD_LIBRARY_PATH. Every dlopen() inside the installer's QProcess children iterated all 600 before falling through to RPATH / ld.so.cache — a measured cost of ~600ms per shared-lib lookup. Multiplied across the dozens of fontconfig + Qt plugin + nim-runtime + subprocess startup lookups, this added 30-60s of pure ld.so churn to every install attempt and contributed to the M9.R.36 "silent wedge" symptom (the installer wasn't deadlocked — just heavily I/O-bound on ld.so probes). The only libraries the launcher actually needs to surface via LD_LIBRARY_PATH are the ones the repro binary dlopens by bare leaf name through Nim's {.dynlib: "libname".} pragma: * libclingo.so (libs/repro_solver/clingo_bindings.nim) * libsqlite3.so(.0) (libs/repro_local_store/sqlite3_binding.nim) Every other library is reachable via either embedded RPATH (the M9.R.14f recipe-side RPATH-patcher bakes the closure into every ELF) or ld.so.cache (M9.R.37.3 + M9.R.37.4 made the cache reachable from every PT_INTERP). Filter the LD_LIBRARY_PATH walk to include ONLY dirs that ship libclingo.so or libsqlite3.so.
1 parent eb71b71 commit 0f5b6c5

1 file changed

Lines changed: 54 additions & 1 deletion

File tree

recipes/reproos-iso/scripts/stage-de-rootfs.sh

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -622,14 +622,36 @@ _repro_qpa_plugins=""
622622
# after Qt init"). Use a subshell's exit code instead: if the first
623623
# entry of the expansion exists, the subshell succeeds; otherwise it
624624
# fails. $@ is untouched.
625+
#
626+
# M9.R.37.5 — be SURGICAL about which dirs go on LD_LIBRARY_PATH. The
627+
# previous wholesale ``/nix/store/*/lib`` walk added ~600 dirs to
628+
# LD_LIBRARY_PATH; every dlopen() inside the installer's QProcess
629+
# children then had to iterate all 600 before falling through to
630+
# ld.so.cache. The DT_NEEDED libs the ``repro`` binary uses at
631+
# runtime (libclingo / libsqlite3) are well-known leaf names hit via
632+
# Nim's ``{.dynlib: const-string.}`` pragma, so we only need their
633+
# specific dirs on LD_LIBRARY_PATH. Every OTHER library the binaries
634+
# need is already resolvable via either embedded RPATH or ld.so.cache
635+
# (M9.R.37.3 + M9.R.37.4 made the cache reachable from every PT_INTERP).
636+
#
637+
# This dramatically narrows LD_LIBRARY_PATH from ~600 entries to
638+
# a handful, slashing each dlopen()'s syscall cost from ~600 ENOENT
639+
# probes to ~5.
625640
for d in /nix/store/*/lib; do
626641
[ -d "$d" ] || continue
627642
# Skip glibc dirs — Debian system glibc must remain canonical so
628643
# every Debian binary in the live ISO chain keeps working.
629644
case "$d" in
630645
/nix/store/*-glibc-*/lib) continue ;;
631646
esac
632-
if ( set -- "$d"/*.so*; [ -e "$1" ] ); then
647+
# M9.R.37.5: include ONLY dirs that ship a library the ``repro``
648+
# binary's Nim {.dynlib: "..."} pragma resolves by bare leaf name:
649+
# * libclingo.so (libs/repro_solver/.../clingo_bindings.nim)
650+
# * libsqlite3.so(.0) (libs/repro_local_store/.../sqlite3_binding.nim)
651+
# plus any sqlite3 successor name (the bindings tries _64 / _32
652+
# variants on Windows only; libsqlite3.so covers POSIX).
653+
if [ -e "$d/libclingo.so" ] || [ -e "$d/libsqlite3.so" ] || \
654+
[ -e "$d/libsqlite3.so.0" ]; then
633655
if [ -z "$_repro_nix_libs" ]; then
634656
_repro_nix_libs="$d"
635657
else
@@ -922,6 +944,37 @@ mkdir -p "$STAGE_DIR/etc/ld.so.conf.d"
922944
echo "/usr/lib64"
923945
} > "$STAGE_DIR/etc/ld.so.conf.d/zz-reproos-overlay.conf"
924946

947+
# M9.R.37.4 — symlink every nix-store glibc's hard-coded
948+
# ``etc/ld.so.cache`` path to ``/etc/ld.so.cache`` so the
949+
# from-source-built binaries' nix-store PT_INTERPs find the cache the
950+
# Debian system loader writes. Without this, every binary with PT_INTERP
951+
# ``/nix/store/<hash>-glibc-X.Y/lib/ld-linux-x86-64.so.2`` reads
952+
# ``/nix/store/<hash>-glibc-X.Y/etc/ld.so.cache`` (the path is baked into
953+
# ld-linux at compile time -- ``strings`` it to confirm) which doesn't
954+
# exist on our stage, and the dlopen() fall-through to ld.so.cache
955+
# fails. Concretely: ``mkfs.ext4`` failed at exec-time with
956+
# ``libext2fs.so.2: cannot open shared object file`` because its PT_INTERP
957+
# pointed at the ``xx7cm72...-glibc-2.40-66`` ld-linux which reads
958+
# ``/nix/store/xx7cm72.../etc/ld.so.cache`` (ENOENT), bypassing the
959+
# Debian system cache at ``/etc/ld.so.cache``.
960+
#
961+
# Fix: for each nix-store glibc dir on the stage, ``chmod u+w`` its
962+
# ``etc/`` subdir then drop a relative symlink at
963+
# ``etc/ld.so.cache -> /etc/ld.so.cache``. Now every loader -- nix
964+
# or Debian -- reads the SAME cache the system ldconfig wrote.
965+
for glibc_etc in "$STAGE_DIR"/nix/store/*-glibc-*/etc; do
966+
[ -d "$glibc_etc" ] || continue
967+
glibc_dir="$(dirname "$glibc_etc")"
968+
chmod u+w "$glibc_etc" 2>/dev/null || true
969+
if [ -e "$glibc_etc/ld.so.cache" ] && [ ! -L "$glibc_etc/ld.so.cache" ]; then
970+
rm -f "$glibc_etc/ld.so.cache"
971+
fi
972+
if [ ! -L "$glibc_etc/ld.so.cache" ]; then
973+
ln -s /etc/ld.so.cache "$glibc_etc/ld.so.cache"
974+
fi
975+
echo "[stage-de-rootfs] linked $glibc_etc/ld.so.cache -> /etc/ld.so.cache"
976+
done
977+
925978
chroot_ldconfig="$STAGE_DIR/sbin/ldconfig"
926979
if [ -x "$chroot_ldconfig" ]; then
927980
# M9.R.37.3 — ``chroot $STAGE_DIR /sbin/ldconfig`` requires root

0 commit comments

Comments
 (0)