Skip to content

Commit 622d570

Browse files
committed
Merge dev: M9.R.42.6 evidence update
2 parents d18a231 + 571301b commit 622d570

2 files changed

Lines changed: 96 additions & 10 deletions

File tree

.github/workflows/ci.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,17 @@ jobs:
170170
set -euo pipefail
171171
ct_root="$(cd "${{ github.workspace }}/../codetracer" && pwd)"
172172
direnv allow "$ct_root"
173-
( cd "$ct_root" && direnv exec "$ct_root" just build )
173+
# codetracer's build uses tup, which mounts a FUSE filesystem. libfuse
174+
# invokes fusermount3 by an absolute (nix-store, NON-setuid) path, so a
175+
# PATH prepend is ignored — it must be pointed at NixOS's setuid wrapper
176+
# via libfuse's FUSERMOUNT_PROG. (programs.fuse provides
177+
# /run/wrappers/bin/fusermount3.) Diagnostics below confirm the wrapper
178+
# exists + its setuid bit on the runner.
179+
echo "RBDIAG wrapper:"; ls -l /run/wrappers/bin/fusermount3 2>&1 || echo " NO WRAPPER on runner"
180+
echo "RBDIAG id: $(id)"
181+
fuse_wrapper=/run/wrappers/bin/fusermount3
182+
( cd "$ct_root" && direnv exec "$ct_root" \
183+
bash -c 'export FUSERMOUNT_PROG='"$fuse_wrapper"'; export PATH=/run/wrappers/bin:"$PATH"; just build' )
174184
ct_bin="$ct_root/src/build-debug/bin/ct"
175185
test -x "$ct_bin" || { echo "ct not built at $ct_bin" >&2; exit 1; }
176186
echo "CT_BIN=$ct_bin" >> "$GITHUB_ENV"

recipes/reproos-iso/run-evidence/m9r42_complete.txt

Lines changed: 85 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -150,12 +150,66 @@ scripts/build_apps.sh to inject the sibling path).
150150
PHASE C: RE-RUN INSTALL + BOOT + DE SMOKE
151151
==========================================
152152

153-
(In progress as of close-out. The second install run with the
154-
M9.R.42.4 timeout bump is running; results captured below.)
155-
156-
(NB: this section is filled in as the run completes. If it
157-
remains blank, the install hit some other gap and the user
158-
will see the truncation in this file.)
153+
Two install runs executed under the M9.R.42 ISO:
154+
155+
Run 1 (10:27-10:42, 900s timeout):
156+
Phase 1 OK 3 s hardware probe
157+
Phase 2 OK 73 s sgdisk -o + sgdisk -n + EXT4 mount
158+
Phase 3 OK /mnt mounted (in kernel ring)
159+
Phase 4 started install-root subcommand
160+
Phase 5 partial 270 MB of /nix rsync'd before timeout
161+
Phase 6 N/A not reached
162+
163+
Diag tarball NOT extracted (autorun script killed mid-rsync;
164+
no chance to dump installer.disk-diag.log from /tmp).
165+
166+
Run 2 (10:48-11:18, 1800s timeout per M9.R.42.4):
167+
Phase 1 OK 3 s hardware probe
168+
Phase 2 OK 72 s sgdisk -o + sgdisk -n + EXT4 mount
169+
(M9.R.42 source: NO -a 2048; just
170+
the post-revert sgdisk -n 1:0:+512M)
171+
Phase 3 OK /mnt mounted
172+
Phase 4 OK install-root subcommand started
173+
Phase 5 partial 1.4 GB rsync'd (the entire /nix store);
174+
/usr was next in queue but timeout hit
175+
Phase 6 N/A not reached
176+
177+
Diag tarball NOT extracted -- but Phase 2 is provable from
178+
the qcow2 post-mortem:
179+
180+
$ qemu-img convert -O raw /tmp/m9r42_install.qcow2 \
181+
/tmp/m9r42_install.raw
182+
$ sfdisk -d /tmp/m9r42_install.raw
183+
label: gpt
184+
first-lba: 34
185+
last-lba: 67108830
186+
/tmp/m9r42_install.raw1 : start=2048, size=1048576,
187+
type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System)
188+
/tmp/m9r42_install.raw2 : start=1050624, size=66058207,
189+
type=0FC63DAF-8483-4772-8E79-3D69D8477DE4 (Linux fs)
190+
191+
$ sudo losetup -P /dev/loop9 /tmp/m9r42_install.raw
192+
$ sudo mount /dev/loop9p2 /tmp/m9r42_mnt
193+
$ sudo du -sh /tmp/m9r42_mnt
194+
1.4 GB <-- rsync was actively writing the live root
195+
$ sudo du -sh /tmp/m9r42_mnt/nix
196+
1.4 GB <-- entire nix-store copied
197+
$ sudo du -sh /tmp/m9r42_mnt/usr
198+
4 KB <-- next in queue when timeout hit
199+
200+
G3 (boot installed) and G4 (DE smoke) remain BLOCKED -- but
201+
not on any source-side bug. The block is purely QEMU
202+
performance: on the eli-wsl host the rsync writes the
203+
1.5 GB live root at ~0.8 MB/s through the virtio-blk +
204+
qcow2 path (single-thread, no KVM passthrough; host is
205+
Windows -> WSL2 -> QEMU = three virtualisation layers).
206+
Bumping the M9.R.42.4 timeout from 1800s -> 3600s would
207+
let the install complete on this host.
208+
209+
On a real CI host with KVM available the rsync should
210+
complete in ~3-5 min wall + ~30 s for grub-install +
211+
~30 s for diag-persist + ~10 s for poweroff, well under
212+
the existing 1800s ceiling.
159213

160214

161215
HONEST REMAINING GAP
@@ -168,12 +222,16 @@ The M9.R.42 milestone scope was:
168222
(no source-side fix needed; M9.R.41.8-12 reverts
169223
were correct; diag instrumentation kept for
170224
future characterisation campaigns)
171-
Phase C: re-run install + boot + DE smoke See above
225+
Phase C: re-run install + boot + DE smoke Phase 2 + 3
226+
proven clean via qcow2 post-mortem (sfdisk -d shows
227+
both partitions; ext4 mount; 1.4 GB rsync'd);
228+
G3 + G4 blocked on QEMU performance on eli-wsl
229+
(not on disko code).
172230
Phase D: close-out (this file) CLOSED
173231

174-
The smaller-than-M9.R.41 gap:
232+
Two smaller-than-M9.R.41 gaps remain:
175233

176-
REPRO HOST BINARY MUST BE FRESHLY BUILT.
234+
GAP 1 -- REPRO HOST BINARY MUST BE FRESHLY BUILT.
177235

178236
The M9.R.42 _m9r42_iso_rebuild.sh script DOES rebuild the
179237
reproos-installer + ISO; it does NOT explicitly rebuild
@@ -199,6 +257,24 @@ The smaller-than-M9.R.41 gap:
199257
Either way, this is an ORTHOGONAL milestone to the disk-apply
200258
work + doesn't affect any in-source disko logic.
201259

260+
GAP 2 -- QEMU PERFORMANCE ON ELI-WSL.
261+
262+
Phase 5 rsync on eli-wsl writes the 1.5 GB live root at
263+
~0.8 MB/s through the Windows -> WSL2 -> QEMU triple-virt
264+
stack, which means a full install needs ~30 min wall
265+
vs. the M9.R.42.4 1800s timeout.
266+
267+
Two fixes -- either landed makes G3 + G4 achievable on
268+
this host without disko changes:
269+
1. bump _m9r42_install.sh INSTALL_TIMEOUT to 3600s; OR
270+
2. run the install on a host with KVM passthrough
271+
(any bare-metal Linux or the new metacraft CI runner).
272+
273+
This is smaller than M9.R.41's gap because the install
274+
IS converging -- it isn't deadlocked or hung; it just
275+
needs more wall clock time. The Phase 2 + 3 cleanness is
276+
already PROVEN via the qcow2 post-mortem.
277+
202278

203279
EVIDENCE FILES LEFT IN /tmp ON ELI-WSL
204280
=======================================

0 commit comments

Comments
 (0)