Skip to content

Commit d28b18f

Browse files
committed
fix(os-rv64): split firstboot and agent readiness markers
1 parent d566edb commit d28b18f

10 files changed

Lines changed: 63 additions & 45 deletions

File tree

packages/chip/scripts/check_os_rv64_chip_boot_contract.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,10 @@ def run_check(args: argparse.Namespace) -> dict[str, object]:
267267
agent_start_pos = marker_position(first_boot, "systemctl start")
268268
add_if(
269269
findings,
270-
ready_pos is not None and agent_start_pos is not None and ready_pos < agent_start_pos,
270+
"elizaos-ready" in first_boot
271+
and ready_pos is not None
272+
and agent_start_pos is not None
273+
and ready_pos < agent_start_pos,
271274
"elizaos_ready_marker_before_agent_start",
272275
"`elizaos-ready` is emitted before the first-boot script attempts to start elizaos-agent.service",
273276
f"{rel(FIRST_BOOT)} READY_LINE offset={ready_pos} systemctl_start offset={agent_start_pos}",

packages/os/linux/variants/elizaos-debian-riscv64/STATUS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ to see exactly what landed.
2727
| Piece | Commit | Owns |
2828
|---------------------------------------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
2929
| 1. Build (Wave 4 live-build) | `c4656f1810` | `Dockerfile`, `build.sh`, `auto/config`, `config/package-lists/elizaos.list.chroot`, `config/hooks/normal/0010-elizaos-agent.hook.chroot`, `config/hooks/normal/0020-grub-efi-riscv64.hook.binary`, `config/includes.binary/extlinux/extlinux.conf`, `manifest.json.template`. End-to-end `lb config``lb build` → verify → checksum → manifest pipeline against Debian Trixie riscv64. |
30-
| 2. Boot (qemu-virt harness) | `ebf816ea14` | `Makefile` (boot targets), `scripts/qemu_virt_boot.sh`, `scripts/qemu_virt_smoke.py`, `scripts/test_qemu_virt_smoke.py`. Wraps `qemu-system-riscv64 -M virt`, emits an evidence JSON conforming to schema `eliza.os.linux.qemu_virt_boot.v1`, greps the serial transcript for the literal marker `elizaos-ready`. |
31-
| 3. Userland (Wave 2B systemd bootstrap)| `31bd8f13ba` | `config/hooks/normal/0030-elizaos-userland.hook.chroot`, `config/includes.chroot/etc/systemd/system/elizaos-{agent,first-boot}.service`, `config/includes.chroot/usr/lib/elizaos/first-boot.sh`, `config/package-lists/elizaos-runtime.list.chroot`, `docs/userland-startup.md`. Creates the `elizaos` system user, the state + config dirs, and writes the `elizaos-ready` line on `/dev/ttyS0`. |
30+
| 2. Boot (qemu-virt harness) | `ebf816ea14` | `Makefile` (boot targets), `scripts/qemu_virt_boot.sh`, `scripts/qemu_virt_smoke.py`, `scripts/test_qemu_virt_smoke.py`. Wraps `qemu-system-riscv64 -M virt`, emits an evidence JSON conforming to schema `eliza.os.linux.qemu_virt_boot.v1`, greps the serial transcript for the literal marker `elizaos-firstboot-ready`. |
31+
| 3. Userland (Wave 2B systemd bootstrap)| `31bd8f13ba` | `config/hooks/normal/0030-elizaos-userland.hook.chroot`, `config/includes.chroot/etc/systemd/system/elizaos-{agent,first-boot}.service`, `config/includes.chroot/usr/lib/elizaos/first-boot.sh`, `config/package-lists/elizaos-runtime.list.chroot`, `docs/userland-startup.md`. Creates the `elizaos` system user, the state + config dirs, and writes the `elizaos-firstboot-ready` line on `/dev/ttyS0`. |
3232
| 4. Gate (e2e runbook + release-check) | `cc10b9f001` | `Makefile` (release-check targets), `docs/e2e-qemu-virt.md`, `scripts/check_release_manifest.py`, `scripts/test_check_release_manifest.py`. Fail-closed validator against `packages/os/release/schema/elizaos-os-release-manifest.schema.json`. BLOCKED informational by default; FAIL under `--strict`. |
3333

3434
## Happy-path command sequence
@@ -82,9 +82,9 @@ make -C packages/os/linux/variants/elizaos-debian-riscv64 release-check-test
8282
| Row | Why it is BLOCKED | Owner / cross-link |
8383
|------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
8484
| `lb build` run | Multi-hour build, multi-GB pull from `deb.debian.org` Trixie riscv64 mirror. Cannot run from inside an interactive sub-agent; not committed as a binary artifact. Recipe in [`docs/e2e-qemu-virt.md`](docs/e2e-qemu-virt.md) step 2. | builder host or CI; external dependency = Debian Trixie riscv64 mirror availability |
85-
| `qemu-virt-boot` transcript | Requires the artifact above + `qemu-system-riscv64` on the host. Until the transcript exists, the `qemu-virt-boot` evidence row in `manifest.json.template` stays `status: missing` and `release-check` reports BLOCKED. No transcript is committed; no `boot_completed: true` is fabricated. | this variant's `qemu_virt_boot.sh` + `qemu_virt_smoke.py`; consumes systemd `elizaos-ready` line from piece 3 |
85+
| `qemu-virt-boot` transcript | Requires the artifact above + `qemu-system-riscv64` on the host. Until the transcript exists, the `qemu-virt-boot` evidence row in `manifest.json.template` stays `status: missing` and `release-check` reports BLOCKED. No transcript is committed; no `boot_completed: true` is fabricated. | this variant's `qemu_virt_boot.sh` + `qemu_virt_smoke.py`; consumes systemd `elizaos-firstboot-ready` line from piece 3 |
8686
| `grub-efi-riscv64-boot` | Hook `config/hooks/normal/0020-grub-efi-riscv64.hook.binary` stages `BOOTRISCV64.EFI` + `grub.cfg` but no boot transcript is captured. Same external dependency chain as the qemu-virt row. | chip-side BSP recipes: [`packages/chip/docs/sw/u-boot/README.md`](../../../../chip/docs/sw/u-boot/README.md), [`packages/chip/docs/android/riscv-bringup.md`](../../../../chip/docs/android/riscv-bringup.md) |
87-
| elizaOS agent binary | First-boot unit can write `elizaos-ready` even when the agent is absent, but `/opt/elizaos/STATUS_LATER_AGENT_BINARY` stays present until the agent installer hook replaces the placeholder with a real `/opt/elizaos/bin/elizaos`. Until then `elizaos-agent.service` stays `failed (ExecStart not found)`. | elizaOS agent-release pipeline (`packages/os/linux/agent/`); not in this variant's scope |
87+
| elizaOS agent binary | First-boot unit can write `elizaos-firstboot-ready` even when the agent is absent, but `/opt/elizaos/STATUS_LATER_AGENT_BINARY` stays present until the agent installer hook replaces the placeholder with a real `/opt/elizaos/bin/elizaos`. Until then first boot enables `elizaos-agent.service` but does not start it and must not emit `elizaos-agent-ready`. | elizaOS agent-release pipeline (`packages/os/linux/agent/`); not in this variant's scope |
8888
| `u-boot-extlinux-boot` | `not-required` for the GRUB EFI qemu-virt artifact. Staged for distroboot follow-up evidence only; not in scope for the current wave. | chip BSP / U-Boot recipe ([`packages/chip/docs/sw/u-boot/README.md`](../../../../chip/docs/sw/u-boot/README.md)) |
8989
| `hardware-board-boot` | No silicon. No physical board. No host hardware available to this repo. `not-required` for the qemu-virt artifact; for any hardware variant it stays BLOCKED until the chip board bring-up team produces a transcripted boot on real silicon. No hardware claim is made by any of the four landing commits. | chip board bring-up team (`packages/chip/docs/evidence/linux/`) |
9090

packages/os/linux/variants/elizaos-debian-riscv64/config/includes.chroot/usr/lib/elizaos/first-boot.sh

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
# 2. Create /var/lib/elizaos (state) and /etc/elizaos (config) with the
99
# right ownership.
1010
# 3. Generate a stable instance UUID at /etc/elizaos/instance-id.
11-
# 4. Emit a single `elizaos-ready instance=<uuid>` line on the serial
12-
# console (/dev/ttyS0) so the qemu-virt harness can detect boot
13-
# completion deterministically.
11+
# 4. Emit a single `elizaos-firstboot-ready instance=<uuid>` line on the
12+
# serial console (/dev/ttyS0) so the qemu-virt harness can detect OS
13+
# first-boot completion deterministically.
1414
# 5. Enable elizaos-agent.service and start it when the agent binary exists.
1515
# 6. Disable this first-boot service so it does not run again.
1616
#
@@ -80,17 +80,17 @@ fi
8080
# Best-effort: if /dev/ttyS0 is not present (e.g. on a board without a
8181
# virt-style 16550), fall back to the kernel printk path so the line
8282
# still ends up in dmesg.
83-
READY_LINE="elizaos-ready instance=${INSTANCE_UUID}"
83+
FIRSTBOOT_READY_LINE="elizaos-firstboot-ready instance=${INSTANCE_UUID}"
8484
if [ -w "${SERIAL_CONSOLE}" ]; then
8585
log "emitting ready marker on ${SERIAL_CONSOLE}"
86-
printf '%s\n' "${READY_LINE}" > "${SERIAL_CONSOLE}"
86+
printf '%s\n' "${FIRSTBOOT_READY_LINE}" > "${SERIAL_CONSOLE}"
8787
elif [ -w /dev/kmsg ]; then
8888
log "serial console ${SERIAL_CONSOLE} not writable; using /dev/kmsg"
89-
printf '%s\n' "${READY_LINE}" > /dev/kmsg
89+
printf '%s\n' "${FIRSTBOOT_READY_LINE}" > /dev/kmsg
9090
else
9191
log "WARN: neither ${SERIAL_CONSOLE} nor /dev/kmsg writable; ready marker only in journal"
9292
fi
93-
log "${READY_LINE}"
93+
log "${FIRSTBOOT_READY_LINE}"
9494

9595
# 5. Enable the agent. Early RISC-V images intentionally ship with a
9696
# STATUS_LATER placeholder until the agent binary is packaged, so do not
@@ -102,6 +102,17 @@ if [ -x "${AGENT_BIN}" ]; then
102102
timeout 10s systemctl start --no-block elizaos-agent.service || {
103103
log "WARN: elizaos-agent.service failed to queue"
104104
}
105+
if timeout 30s sh -c 'until systemctl is-active --quiet elizaos-agent.service; do sleep 1; done'; then
106+
AGENT_READY_LINE="elizaos-agent-ready instance=${INSTANCE_UUID}"
107+
if [ -w "${SERIAL_CONSOLE}" ]; then
108+
printf '%s\n' "${AGENT_READY_LINE}" > "${SERIAL_CONSOLE}"
109+
elif [ -w /dev/kmsg ]; then
110+
printf '%s\n' "${AGENT_READY_LINE}" > /dev/kmsg
111+
fi
112+
log "${AGENT_READY_LINE}"
113+
else
114+
log "WARN: elizaos-agent.service did not become active within 30s"
115+
fi
105116
else
106117
log "agent binary missing at ${AGENT_BIN}; leaving elizaos-agent.service enabled but not started"
107118
fi

packages/os/linux/variants/elizaos-debian-riscv64/docs/e2e-qemu-virt.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -140,9 +140,11 @@ The boot harness writes a JSON evidence file conforming to the
140140
```
141141

142142
The transcript log must additionally contain the literal marker
143-
`elizaos-ready` — the elizaOS first-boot unit prints that line once the
144-
agent is up. If the marker is missing the release-manifest gate (Step 4)
145-
reports FAIL, not PASS.
143+
`elizaos-firstboot-ready` — the elizaOS first-boot unit prints that line once
144+
OS userland initialization has completed. Agent liveness is a separate
145+
`elizaos-agent-ready` marker and is not proven by the qemu-virt release gate.
146+
If the first-boot marker is missing the release-manifest gate (Step 4) reports
147+
FAIL, not PASS.
146148

147149
Expected duration: **2 – 6 min** for an end-to-end boot once the
148150
guest kernel is cached. Allocate at least 2 GB of guest RAM and one
@@ -173,7 +175,7 @@ What the validator does:
173175
5. Asserts `boot_completed === true` and that `iso_sha256` matches
174176
the `sha256` on the parent manifest entry.
175177
6. Asserts the transcript referenced by `transcript_path` contains
176-
the literal `elizaos-ready` marker.
178+
the literal `elizaos-firstboot-ready` marker.
177179
7. Aggregates the result. `STATUS: BLOCKED` is informational;
178180
`STATUS: FAIL` is a release blocker.
179181

@@ -223,9 +225,9 @@ Other external dependencies that this runbook does **not** unblock:
223225
riscv64 set is incomplete, `lb build` exits non-zero. The validator
224226
has no way to distinguish a mirror outage from a real config break;
225227
the failing `lb build` log is the source of truth.
226-
- **elizaOS agent binary publication.** The first-boot unit needs a
227-
published agent. Until that ships, the `elizaos-ready` marker will
228-
never appear and Step 3 stays BLOCKED.
228+
- **elizaOS agent binary publication.** The first-boot marker can appear before
229+
the agent exists. Until the RV64 image packages `/opt/elizaos/bin/elizaos`,
230+
no `elizaos-agent-ready` marker or agent health evidence should be claimed.
229231
- **Real board boot.** The qemu-virt boot is necessary but not
230232
sufficient. The `hardware-board-boot` row stays BLOCKED until the
231233
chip team produces a transcripted boot on real silicon.
@@ -239,7 +241,7 @@ Other external dependencies that this runbook does **not** unblock:
239241
| Evidence row missing the JSON file | `BLOCKED: evidence file not present: <path>` |
240242
| `boot_completed=false` in evidence | `FAIL: qemu-virt boot did not complete` |
241243
| `iso_sha256` mismatch | `FAIL: iso_sha256 mismatch between manifest and evidence` |
242-
| Transcript missing `elizaos-ready` | `FAIL: transcript missing required marker: elizaos-ready` |
244+
| Transcript missing `elizaos-firstboot-ready` | `FAIL: transcript missing required marker: elizaos-firstboot-ready` |
243245
| Schema OK, every row collected, marker present | `PASS: release manifest gate ok` |
244246

245247
`PASS` from this runbook means the qemu-virt half of the promotion

packages/os/linux/variants/elizaos-debian-riscv64/docs/userland-startup.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ elizaos-first-boot.service (Type=oneshot, RemainAfterExit=yes)
2929
├─ create /var/lib/elizaos (0750, elizaos:elizaos)
3030
├─ create /etc/elizaos (0750, root:elizaos)
3131
├─ generate /etc/elizaos/instance-id (UUIDv4)
32-
├─ write "elizaos-ready instance=<uuid>" to /dev/ttyS0
32+
├─ write "elizaos-firstboot-ready instance=<uuid>" to /dev/ttyS0
3333
├─ systemctl enable + start elizaos-agent.service
3434
├─ touch /var/lib/elizaos/.first-boot-complete
3535
└─ systemctl disable elizaos-first-boot.service
@@ -67,14 +67,14 @@ The ordering guarantees that:
6767
The first-boot script writes a single line to `/dev/ttyS0`:
6868

6969
```
70-
elizaos-ready instance=<uuid>
70+
elizaos-firstboot-ready instance=<uuid>
7171
```
7272

7373
The `<uuid>` matches the contents of `/etc/elizaos/instance-id` and is
7474
generated once on the first successful boot. This is the **only**
7575
ready signal the qemu-virt harness depends on; do not relocate it,
76-
reformat it, or emit additional `elizaos-ready` lines from any other
77-
unit. The harness greps for `^elizaos-ready instance=` on the captured
76+
reformat it, or emit additional `elizaos-firstboot-ready` lines from any other
77+
unit. The harness greps for `^elizaos-firstboot-ready instance=` on the captured
7878
serial transcript.
7979

8080
If `/dev/ttyS0` is not writable (e.g. on a real RISC-V dev board with
@@ -96,13 +96,13 @@ binary. Instead, the chroot hook
9696
drops a marker file at
9797
`/opt/elizaos/STATUS_LATER_AGENT_BINARY`.
9898

99-
This separates two qualitatively different boot outcomes:
99+
This separates first-boot completion from agent liveness:
100100

101-
| Outcome | `/opt/elizaos/STATUS_LATER_AGENT_BINARY` | `elizaos-ready` on `ttyS0` | `elizaos-agent.service` state |
102-
|---|---|---|---|
103-
| Boot fine, agent missing | present | **yes** | `failed` (ExecStart not found) |
104-
| Boot fine, agent live | **absent** | **yes** | `active (running)` |
105-
| Boot broken | irrelevant | **no** | irrelevant |
101+
| Outcome | `/opt/elizaos/STATUS_LATER_AGENT_BINARY` | `elizaos-firstboot-ready` on `ttyS0` | `elizaos-agent-ready` on `ttyS0` | `elizaos-agent.service` state |
102+
|---|---|---|---|---|
103+
| Boot fine, agent missing | present | **yes** | **no** | not started |
104+
| Boot fine, agent live | **absent** | **yes** | **yes** | `active (running)` |
105+
| Boot broken | irrelevant | **no** | **no** | irrelevant |
106106

107107
When the build pipeline gains the ability to install a real elizaOS
108108
release artifact, the hook that lays the binary down at

packages/os/linux/variants/elizaos-debian-riscv64/scripts/check_release_manifest.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
in the default mode and exit 1 under ``--strict``.
1919
* ``FAIL`` — release blocker: schema mismatch, ``iso_sha256``
2020
mismatch, ``boot_completed=false``, or a missing
21-
``elizaos-ready`` marker in the transcript. Always
21+
``elizaos-firstboot-ready`` marker in the transcript. Always
2222
exit 1 regardless of ``--strict``.
2323
2424
The validator deliberately uses the same vocabulary as the chip readiness
@@ -57,9 +57,10 @@
5757
# ``missing`` rows stay informational BLOCKED.
5858
PROMOTED_STATUSES: frozenset[str] = frozenset({"candidate", "published"})
5959

60-
# Marker the elizaOS first-boot unit prints once the agent is up. The qemu-virt
61-
# boot transcript must contain this literal string for the gate to PASS.
62-
REQUIRED_TRANSCRIPT_MARKER = "elizaos-ready"
60+
# Marker the elizaOS first-boot unit prints once OS userland initialization
61+
# completes. Agent liveness is intentionally tracked by a separate
62+
# ``elizaos-agent-ready`` marker and is outside this qemu-virt release gate.
63+
REQUIRED_TRANSCRIPT_MARKER = "elizaos-firstboot-ready"
6364
GRUB_TRANSCRIPT_MARKERS: tuple[str, ...] = (
6465
"GNU GRUB",
6566
"Booting `elizaOS Live (RISC-V 64)'",

0 commit comments

Comments
 (0)