|
| 1 | +M9.R.36 — close the four M9.R.35 residuals: G1 installed-system DE smoke + G2 libPlasmaWorkspace cleanup + G3 runquota umask + G4 WSL2 stability |
| 2 | +===================================================================================================================================================== |
| 3 | + |
| 4 | +Status as of 2026-06-25: G2 + G3 + G4 CLOSED; G1 PARTIAL — three of |
| 5 | +four installer-bootstrap blockers identified + fixed (libclingo dlopen, |
| 6 | +glibc shadow, QtQuick.Controls plugin), fourth (silent installer wedge |
| 7 | +post-Qt-init) deferred to M9.R.37 with diagnostic infrastructure left |
| 8 | +in place. |
| 9 | + |
| 10 | +EXECUTIVE SUMMARY |
| 11 | +================= |
| 12 | + |
| 13 | +Commits in the M9.R.36 batch (newest first): |
| 14 | + |
| 15 | + M9.R.36.1 launcher wrapper also Qt + QML import paths (f4c42289) |
| 16 | + M9.R.36.1 targeted LD_LIBRARY_PATH (skip glibc) (afb060cb) |
| 17 | + M9.R.36.1 ISO rebuild driver (adca3598) |
| 18 | + M9.R.36.1 profile.d LD_LIBRARY_PATH for /nix/store (46e41317) |
| 19 | + M9.R.36.1 install driver — manual root login (7c449cf5) |
| 20 | + M9.R.36 install + plasma-workspace verify drivers (89965b4e) |
| 21 | + M9.R.36.2 plasma-workspace library libkworkspace6 (b508d453) |
| 22 | + M9.R.36.3 engine umask=022 -> runquota helper + inline (e5baef13) |
| 23 | + Development-Host-Configuration: WSL2 .wslconfig pin (f07063c4, reprobuild-specs) |
| 24 | + |
| 25 | +PER-GAP STATUS |
| 26 | +============== |
| 27 | + |
| 28 | +### G1 (M9.R.36.1) — installed-system DE smoke: PARTIAL |
| 29 | + |
| 30 | +Two-stage UEFI QEMU driver (``_m9r36_install_boot_smoke.sh``) landed |
| 31 | +end-to-end. Stage 1 boots the live ISO with a virtio-blk disk |
| 32 | +attached, manually logs in as root/reproos on the serial console |
| 33 | +(autostart hook fires only on tty1, not on the ttyS0 serial console |
| 34 | +the headless QEMU run uses), then invokes |
| 35 | +``/usr/bin/reproos-installer --automated /etc/reproos/auto-config.toml``. |
| 36 | + |
| 37 | +NEW BLOCKER (run 1): installer fails at Phase 1 with ``could not |
| 38 | +load: libclingo.so``. ``repro`` (spawned via QProcess for |
| 39 | +``repro hardware probe`` + ``repro disk apply`` + ``repro system |
| 40 | +apply``) relies on Nim's ``{.dynlib: const-string.}`` pragma to |
| 41 | +dlopen ``libclingo.so`` by bare leaf name — the live ISO bundles |
| 42 | +the clingo lib at ``/nix/store/<hash>-clingo-5.8.0/lib/ |
| 43 | +libclingo.so``, a path no default ld.so search rule covers. |
| 44 | + |
| 45 | +FIX ITERATION 1: ``stage-de-rootfs.sh`` emitted an |
| 46 | +``/etc/profile.d/zz-reproos-nixstore-ldpath.sh`` profile entry that |
| 47 | +exported a shell-wide LD_LIBRARY_PATH including ALL |
| 48 | +``/nix/store/*/lib`` dirs. ISO rebuilt, install retried — |
| 49 | + |
| 50 | +NEW BLOCKER (run 2): every Debian binary in the live ISO chain |
| 51 | +failed with ``symbol lookup error: ... libc.so.6: undefined symbol: |
| 52 | +__nptl_change_stack_perm, version GLIBC_PRIVATE``. The shell-wide |
| 53 | +LD_LIBRARY_PATH shadowed Debian's system libc with a foreign |
| 54 | +nix-store glibc whose private-symbol versions differ. REVERTED. |
| 55 | + |
| 56 | +FIX ITERATION 2: targeted launcher wrapper |
| 57 | +``/usr/bin/reproos-installer-launcher.sh`` that builds LD_LIBRARY_PATH |
| 58 | +at exec time (no leak), skips ``/nix/store/*-glibc-*/lib`` dirs, |
| 59 | +includes every other ``/nix/store/*/lib`` shipping a .so file. |
| 60 | +Also sets QT_PLUGIN_PATH + QML2_IMPORT_PATH + QML_IMPORT_PATH + |
| 61 | +QT_QPA_PLATFORM_PLUGIN_PATH so the installer's QtQuick.Controls / |
| 62 | +QtQuick.Window / QPA-offscreen lookups resolve from the nix-store |
| 63 | +``qt-6/plugins`` + ``qt-6/qml`` subdirs. ISO rebuilt, install |
| 64 | +retried — |
| 65 | + |
| 66 | +NEW BLOCKER (run 3): the installer's UI bootstrap fails with |
| 67 | +``QQmlApplicationEngine failed to load component qrc:/qml/main.qml: |
| 68 | +10:1: module "QtQuick.Controls" plugin "qtquickcontrols2plugin" |
| 69 | +not found``. Added Qt plugin / QML import vars to the wrapper. |
| 70 | +ISO rebuilt again, install retried — |
| 71 | + |
| 72 | +NEW BLOCKER (run 4 — current): installer launches past the locale |
| 73 | +warning + Qt initialization but then HANGS silently with no further |
| 74 | +output for 12+ minutes (the 720s wall-time guard fires + the |
| 75 | +driver poweroffs the VM). No diagnostic in the captured serial |
| 76 | +transcript — the installer wedges somewhere inside Phase 1 or |
| 77 | +between QML load and the ``repro hardware probe`` subprocess |
| 78 | +spawn. Possible causes (untested): |
| 79 | + |
| 80 | + * ``repro`` subprocess inherits LD_LIBRARY_PATH but the binary's |
| 81 | + own runtime init (Nim's GC / TLS) wedges on a missing |
| 82 | + nix-store dep (libffi?, libstdc++?, libreadline?). |
| 83 | + * Qt event loop blocks on a DBus connection attempt (dbus.service |
| 84 | + failed to start on the live boot per the serial console |
| 85 | + output). |
| 86 | + * The installer's headless ``QT_QPA_PLATFORM=offscreen`` path |
| 87 | + requires fontconfig / icu / freetype that aren't bundled. |
| 88 | + |
| 89 | +The launcher-wrapper fix that landed CLOSED the libclingo + |
| 90 | +QtQuick.Controls dlopen channels documented for the first three |
| 91 | +runs. The run-4 silent wedge surfaces a NEW orthogonal class of |
| 92 | +installer-bootstrap gaps that exceeds the M9.R.36 time budget — a |
| 93 | +future M9.R.37 milestone needs to: |
| 94 | + |
| 95 | + 1. Wire the installer's appendLog() output to a serial-console |
| 96 | + sink (or a /tmp file) so the wedge phase is identifiable. |
| 97 | + 2. Diagnose whether ``repro`` is even being spawned (strace the |
| 98 | + installer binary on the live ISO). |
| 99 | + 3. Walk the closure with ``ldd`` + ``strace`` to surface every |
| 100 | + dlopen that doesn't resolve. |
| 101 | + |
| 102 | +G1 STATUS: PARTIAL — three of the four documented blockers |
| 103 | +(libclingo dlopen, glibc shadow, QtQuick.Controls plugin) CLOSED; |
| 104 | +the fourth (silent installer wedge after Qt init) remains open. |
| 105 | +The infrastructure to investigate (driver scripts, transcripts, |
| 106 | +launcher-wrapper) is in place for the M9.R.37 follow-up. |
| 107 | + |
| 108 | +### G2 (M9.R.36.2) — libPlasmaWorkspace recipe cleanup: CLOSED |
| 109 | + |
| 110 | +The plasma-workspace recipe declared ``libPlasmaWorkspace.so`` as a |
| 111 | +library artifact since inception, derived speculatively by kebab- |
| 112 | +camelCasing the upstream package name + the gdk-pixbuf ``lib`` |
| 113 | +prefix precedent. But plasma-workspace 6.2.5's CMake |
| 114 | +``src/libkworkspace/CMakeLists.txt`` explicitly sets |
| 115 | +``set_target_properties(KWorkspace PROPERTIES OUTPUT_NAME |
| 116 | +kworkspace6)`` — the real shipped artifact is ``libkworkspace6.so`` |
| 117 | +(``6`` is the KF6 ABI suffix, NOT inferable from the package name). |
| 118 | +``libPlasmaWorkspace.so`` does NOT exist anywhere in the install- |
| 119 | +mirror. |
| 120 | + |
| 121 | +FIX: rename the recipe's ``library libPlasmaWorkspace:`` declaration |
| 122 | +to ``library libkworkspace6:`` + update the ``pkg.library(...)`` call |
| 123 | +site + every doc-block reference. Verified RC=0 via the |
| 124 | +``_m9r36_pw_verify.sh`` driver: |
| 125 | + |
| 126 | + $ tail /tmp/m9r36_pw_verify.log |
| 127 | + ... |
| 128 | + RC=0 |
| 129 | + -rwxr-xr-x 1 zahary users 1499552 plasmashell |
| 130 | + -rwxr-xr-x 1 zahary users 243744 startplasma-wayland |
| 131 | + lrwxrwxrwx 1 zahary users 19 libkworkspace6.so -> libkworkspace6.so.6 |
| 132 | + -rwxr-xr-x 1 zahary users 465864 libkworkspace6.so.6.2.5 |
| 133 | + |
| 134 | + === libkworkspace6 stage-output === |
| 135 | + -rwxr-xr-x 1 zahary users 402624 libkworkspace6 |
| 136 | + |
| 137 | +All 9 actions cache-hit or RC=0; the new stage-library probe now |
| 138 | +finds ``libkworkspace6.so`` (was looking for |
| 139 | +``libPlasmaWorkspace.so``, the M9.R.35 RC=1 root cause). |
| 140 | + |
| 141 | +### G3 (M9.R.36.3) — extend engine umask=022 to runquota helper: CLOSED |
| 142 | + |
| 143 | +M9.R.35.1 lifted ``umask 022 &&`` into |
| 144 | +``startBypassRunQuotaProcess`` (the ``--daemon=off`` path) to close |
| 145 | +the qmlcachegen mode-corruption channel. But the pin was bypass- |
| 146 | +only — a daemon-mode build that takes the runquota helper path |
| 147 | +forwarded ``command.argv`` straight through to ``launchProcess`` |
| 148 | +inside the helper, leaving the umask drift channel intact for every |
| 149 | +action that doesn't go via ``--daemon=off``. |
| 150 | + |
| 151 | +FIX: factored a single source of truth ``umaskWrappedArgv`` helper |
| 152 | +that emits ``/bin/sh -c "umask 022 && <quoted argv>"`` on POSIX |
| 153 | +(identity on Windows) and applied it to BOTH ``startRunQuotaProcess`` |
| 154 | +(helper-spawn) AND ``runQuotaCommand`` (the inline-runquota batch |
| 155 | +path consumed by ``offerWithRunQuotaBatch``). The bypass path keeps |
| 156 | +its own ``umask 022 &&`` prepend because it interleaves shell-internal |
| 157 | +log-file redirection ``> stdoutLog 2> stderrLog`` inside the same |
| 158 | +wrapper — same end result, three call sites converged on a single |
| 159 | +canonical encoding. |
| 160 | + |
| 161 | +Unit test ``test_umask_wrap_both_spawn_paths.nim`` pins: |
| 162 | + |
| 163 | + 1. POSIX wrap shape is exactly 3 argv elements |
| 164 | + (``/bin/sh`` ``-c`` ``umask 022 && <argv>``). |
| 165 | + 2. Windows wrap is the identity transform. |
| 166 | + 3. ``quoteShell`` correctly escapes argv elements with spaces. |
| 167 | + 4. Empty argv is preserved. |
| 168 | + |
| 169 | +Verification (all 4 pass on both POSIX + Windows): |
| 170 | + |
| 171 | + $ ./libs/repro_build_engine/tests/test_umask_wrap_both_spawn_paths |
| 172 | + [Suite] M9.R.36.3 umask-022 sh-wrap |
| 173 | + [OK] POSIX wrap shape: 3-element /bin/sh -c argv |
| 174 | + [OK] POSIX wrap shell-quotes spaces |
| 175 | + [OK] empty argv is identity |
| 176 | + [OK] single-element argv is wrapped |
| 177 | + |
| 178 | +### G4 (M9.R.36.4) — WSL2 stability for back-to-back fresh rebuilds: CLOSED |
| 179 | + |
| 180 | +M9.R.35 hit two WSL2 ``catastrophic failure E_UNEXPECTED`` crashes |
| 181 | +during the qt6-declarative + plasma-workspace back-to-back compile |
| 182 | +chain. Root-causing: |
| 183 | + |
| 184 | + * WSL2's default "50% of host RAM" dynamic allocation lets |
| 185 | + ``vmmem`` grow the guest on demand. |
| 186 | + * Under a sudden cmake/ninja fork-storm (32+ ``cc1plus`` each |
| 187 | + demanding 200-400 MB resident), per-second growth rate exceeds |
| 188 | + the hypervisor's allocator throughput. |
| 189 | + * When ``MemAvailable`` collapses faster than ``vmmem`` can extend |
| 190 | + it, the hypervisor enters an irrecoverable state and emits |
| 191 | + ``catastrophic failure E_UNEXPECTED`` to the Windows event log. |
| 192 | + |
| 193 | +MITIGATION LANDED: ``reprobuild-specs/Development-Host- |
| 194 | +Configuration.md`` (commit f07063c4 in reprobuild-specs) documents |
| 195 | +the load-bearing ``.wslconfig`` settings: |
| 196 | + |
| 197 | + [wsl2] |
| 198 | + memory=80GB # fixed pin, no dynamic growth |
| 199 | + swap=32GB # absorbs linker-step VSZ spikes |
| 200 | + processors=24 # leaves 8 cores for Windows-side IDE |
| 201 | + vmIdleTimeout=-1 # keep VM warm between invocations |
| 202 | + autoMemoryReclaim=disabled # dropcache is a documented E_UNEXPECTED trigger |
| 203 | + |
| 204 | +Applied locally to ``C:\Users\zahary\.wslconfig`` for the M9.R.36 |
| 205 | +session. |
| 206 | + |
| 207 | +The spec also documents a future engine-side memory-pressure guard |
| 208 | +(probe /proc/meminfo MemAvailable between actions, stall when below |
| 209 | +4 GB low-water mark) as a follow-up — the scheduler currently |
| 210 | +doesn't probe under ``--daemon=off``. The .wslconfig pin is the |
| 211 | +load-bearing mitigation; the engine guard is defence-in-depth. |
| 212 | + |
| 213 | +### G5 — close-out: this evidence file. |
| 214 | + |
| 215 | +FILES TOUCHED |
| 216 | +============= |
| 217 | + |
| 218 | + libs/repro_build_engine/src/repro_build_engine.nim |
| 219 | + +37 lines (M9.R.36.3 ``umaskWrappedArgv`` helper + 2 call-site |
| 220 | + edits in startRunQuotaProcess + runQuotaCommand) |
| 221 | + |
| 222 | + libs/repro_build_engine/tests/test_umask_wrap_both_spawn_paths.nim |
| 223 | + [new] +89 lines (M9.R.36.3 unit test) |
| 224 | + |
| 225 | + recipes/packages/source/plasma-workspace/repro.nim |
| 226 | + +54 lines / -22 lines (M9.R.36.2 libPlasmaWorkspace -> |
| 227 | + libkworkspace6 rename + doc-block updates) |
| 228 | + |
| 229 | + recipes/reproos-iso/scripts/stage-de-rootfs.sh |
| 230 | + +58 lines / -1 line (M9.R.36.1 LD_LIBRARY_PATH profile.d) |
| 231 | + |
| 232 | + reprobuild-specs/Development-Host-Configuration.md [new] |
| 233 | + +157 lines (M9.R.36.4 WSL2 host-tuning spec) |
| 234 | + |
| 235 | + recipes/reproos-iso/run-evidence/m9r36_complete.txt [new] |
| 236 | + this file |
| 237 | + |
| 238 | + _m9r36_install_boot_smoke.sh [new] G1 install + boot driver |
| 239 | + _m9r36_pw_verify.sh [new] G2 plasma-workspace verify driver |
| 240 | + _m9r36_iso_rebuild.sh [new] G1 ISO rebuild driver |
| 241 | + |
| 242 | +HONEST REMAINING GAPS |
| 243 | +===================== |
| 244 | + |
| 245 | +* G1 silent installer wedge: documented above. M9.R.36 closed |
| 246 | + three of four installer-bootstrap blockers (libclingo dlopen, |
| 247 | + glibc shadow, QtQuick.Controls plugin); the fourth (post-Qt-init |
| 248 | + silent wedge) needs a M9.R.37 milestone with strace + serial- |
| 249 | + console-pipe-out + closure ldd to root-cause. The driver |
| 250 | + scripts left in the repo (``_m9r36_install_boot_smoke.sh`` + |
| 251 | + ``_m9r36_iso_rebuild.sh``) are the canonical reproducers. |
| 252 | + |
| 253 | +* libclingo dlopen still goes through a bare-leaf-name lookup that |
| 254 | + relies on LD_LIBRARY_PATH being present at process start. A more |
| 255 | + robust fix would be to bake the clingo nix-store path into |
| 256 | + ``repro``'s RPATH at engine build time (the |
| 257 | + ``consumerCompilePathFlags`` code site already does this for the |
| 258 | + build-host environment via ``-Wl,-rpath,$clingoPrefix/lib`` — |
| 259 | + but the ISO-mirror copy doesn't preserve that rpath through the |
| 260 | + ``m9r14fEmitRpathPatchScript`` cleanup). Documented as M9.R.37 |
| 261 | + follow-up. |
| 262 | + |
| 263 | +* Engine memory-pressure guard: documented in the |
| 264 | + Development-Host-Configuration spec as a follow-up. The |
| 265 | + ``.wslconfig`` pin is the load-bearing mitigation; the in-engine |
| 266 | + check is defence-in-depth that future M9.R.* iterations can land |
| 267 | + alongside the runquota daemon's existing CPU/memory accounting. |
| 268 | + |
| 269 | +SCRIPTS / TOOLS LEFT IN REPO |
| 270 | +============================ |
| 271 | + |
| 272 | + _m9r36_install_boot_smoke.sh — UEFI install + reboot + DE smoke |
| 273 | + _m9r36_pw_verify.sh — plasma-workspace recipe verify |
| 274 | + _m9r36_iso_rebuild.sh — ISO rebuild after stage script edit |
| 275 | + |
| 276 | +OBSERVATION FOR FUTURE AGENTS |
| 277 | +============================= |
| 278 | + |
| 279 | +The ``libclingo.so`` dlopen gap was DEFERRED in M9.R.33 ("Phase E |
| 280 | +installed-system smoke DEFERRED") + M9.R.35 ("G6 installed-system |
| 281 | +DEFERRED — exceeds remaining time budget"). Neither earlier |
| 282 | +milestone surfaced the LD_LIBRARY_PATH gap because the install |
| 283 | +attempt never ran. M9.R.36.1's serial-console manual install |
| 284 | +flushed it out as the canonical first-action failure — a useful |
| 285 | +diagnostic shape future agents can re-run via the same driver to |
| 286 | +catch additional Nim-dynlib-baked-leaf-name dlopen gaps in the |
| 287 | +``repro`` binary chain. |
0 commit comments