Skip to content

fix(os): polish elizaOS live demo#7803

Merged
lalalune merged 87 commits into
developfrom
nubs/elizaos-live-prod-hardening-20260519
May 20, 2026
Merged

fix(os): polish elizaOS live demo#7803
lalalune merged 87 commits into
developfrom
nubs/elizaos-live-prod-hardening-20260519

Conversation

@NubsCarson
Copy link
Copy Markdown
Member

@NubsCarson NubsCarson commented May 19, 2026

Summary

This keeps the elizaOS Live / USB-demo branch current with develop and hardens the USB installer, live demo docs, runtime packaging validation, and OS demo surface.

  • Polishes the elizaOS Live visual/app path while preserving the normal GNOME/Tails desktop stack.
  • Keeps the bundled elizaOS/Milady app as the auto-starting home surface.
  • Hardens USB installer planning/execution with backend-owned planId, exact local-origin backend handling, destructive-write gating, plan expiry, live/root USB refusal, device revalidation, and explicit target confirmation.
  • Adds safe USB installer proof coverage: fake-media write flow, Playwright desktop/mobile wizard smoke, and a Linux scsi_debug virtual block-device write/readback test.
  • Adds CI coverage for the USB browser E2E path and conditionally runs the Linux virtual block-device proof when the runner kernel provides scsi_debug.
  • Keeps the PR branch merged with current origin/develop using normal merges, not force pushes.

Current Source Base

Updated 2026-05-20.

  • PR head: 5de7b20881b9731405a80d33e04866d3f2541373
  • Latest merged origin/develop: c73f1768b68ea72b5df83efeeaadea49f812555f
  • Branch was pushed normally; no force-push was used.
  • GitHub reports the PR is mergeable; remaining status is check-dependent.

Latest Local Validation

Ran on the merged PR head before pushing the latest commits.

bun run verify:cloud
bun run test:cloud
bun run --cwd packages/os/usb-installer test
bun run --cwd packages/os/usb-installer typecheck
bun run --cwd packages/os/usb-installer lint
bun run --cwd packages/os/usb-installer build
bun run --cwd packages/os/usb-installer test:e2e
bun run --cwd packages/os/usb-installer test:linux-virtual-usb
bun run --cwd packages/os/usb-installer test -- src/__tests__/linux-virtual-block-device-e2e.test.ts
git diff --check

Results:

  • Cloud verify passed.
  • Cloud package-wide unit tests passed: 266 tests across 28 files.
  • USB installer unit/integration suite passed: 9 files, 80 tests, 1 opt-in virtual-device skip.
  • USB installer typecheck, build, and expanded Biome lint passed.
  • Playwright e2e passed: 6 tests across desktop/mobile render and mocked guarded wizard flow.
  • Linux virtual USB proof passed locally with ELIZAOS_USB_TEST_SCSI_DEBUG=1: disposable scsi_debug block device, real lsblk, sudo -n dd, sync, readback SHA-256 match, cleanup verified with scsi_debug unloaded.
  • The same virtual test is skipped by default without the opt-in env, matching CI behavior on runners without scsi_debug.
  • git diff --check passed.

Current GitHub Status

  • PR remains draft while GitHub checks rerun on 5de7b20881.
  • The previous cloud CI failures were package-wide Bun mock leakage; this branch now isolates those mocks and validates verify:cloud + test:cloud locally.
  • The earlier OS release CI failure was GitHub's Azure kernel missing scsi_debug; CI now runs the proof only when the module exists and emits a notice otherwise.
  • The earlier merge conflict was resolved by merging origin/develop@c73f1768b6 and taking the newer upstream GitHub live-artifact validator workflow shape.
  • Skipped fork/security/manual gates may still appear as expected skips.

Remaining Hardware/Product Gaps

Before making hardware/product claims, still needed:

  • Repeat guarded physical USB flash/readback for a final ISO built from the current head.
  • Boot that USB on real hardware.
  • Validate real USB Persistent Storage create/unlock/delete behavior.
  • Validate privacy/direct networking behavior for app, renderer, embedded browser, OAuth, and external web surfaces.
  • Production release still needs signed image manifests, signed privileged helpers, real updater/rollback infrastructure, SBOM/provenance, recovery policy, and formal inherited Tails sudoers review.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f2c31ae-cef5-4329-a151-a1a1e12d4ace

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nubs/elizaos-live-prod-hardening-20260519

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

sourcePath && fs.existsSync(sourcePath)
? fs.readFileSync(sourcePath, "utf8")
: fs.readFileSync(filePath, "utf8").replace(
/export \{\n handleWalletRoutes,\n type WalletAddressesSnapshot,\n type WalletRouteContext,\n type WalletRouteDependencies,\n type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces are hard to count. Use {2}.

Suggested change
/export \{\n handleWalletRoutes,\n type WalletAddressesSnapshot,\n type WalletRouteContext,\n type WalletRouteDependencies,\n type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,
/export \{\n {2}handleWalletRoutes,\n {2}type WalletAddressesSnapshot,\n {2}type WalletRouteContext,\n {2}type WalletRouteDependencies,\n {2}type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,

@NubsCarson NubsCarson force-pushed the nubs/elizaos-live-prod-hardening-20260519 branch from bf275a2 to 8287f9a Compare May 19, 2026 13:04
@NubsCarson NubsCarson force-pushed the nubs/elizaos-live-prod-hardening-20260519 branch 2 times, most recently from 3f2c48b to e68c797 Compare May 19, 2026 15:44
@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26108946361, lifeops-multi-tier-frontier-26108946361

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26108946520

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26108946520 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26108946520

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.320
pass@k: 0.320
Total cost: $0.8595

Full artifacts: see the lifeops-run-hermes-26108946520 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26111733572

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26111733572 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26111733572

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.320
pass@k: 0.320
Total cost: $0.7427

Full artifacts: see the lifeops-run-hermes-26111733572 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large — cancelled

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26111733575, lifeops-multi-tier-frontier-26111733575

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26111935181

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26111935181 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26111935148, lifeops-multi-tier-frontier-26111935148

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26111935181

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.7702

Full artifacts: see the lifeops-run-hermes-26111935181 upload on this run.

@NubsCarson NubsCarson force-pushed the nubs/elizaos-live-prod-hardening-20260519 branch from 6ad2187 to 80d16c3 Compare May 19, 2026 17:03
@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large — failure

frontier — failure

Artifacts: lifeops-multi-tier-large-26112591664, lifeops-multi-tier-frontier-26112591664

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26113068019

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26113068019 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26113067938, lifeops-multi-tier-frontier-26113067938

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26113068019

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.240
pass@k: 0.240
Total cost: $0.8883

Full artifacts: see the lifeops-run-hermes-26113068019 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26113303728, lifeops-multi-tier-frontier-26113303728

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26113304058

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26113304058 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26113304058

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.360
pass@k: 0.360
Total cost: $0.7672

Full artifacts: see the lifeops-run-hermes-26113304058 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26113537126

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26113537126 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26113537111, lifeops-multi-tier-frontier-26113537111

…-20260519' into nubs/elizaos-live-prod-hardening-20260519

# Conflicts:
#	bun.lock
#	plugins/plugin-local-inference/package.json
@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26131315370

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26131315370 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26131315296, lifeops-multi-tier-frontier-26131315296

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26131315370

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.8431

Full artifacts: see the lifeops-run-hermes-26131315370 upload on this run.

@NubsCarson
Copy link
Copy Markdown
Member Author

Latest refresh pushed as d3eb80c11ebeb72739a9ad59cada31b8a682ed9f.

I merged current origin/develop through d32b6a446185e77267f1e51005a1796de244b43b using normal merges. The earlier CI root cause was @elizaos/plugin-local-inference failing with tsup: command not found; latest upstream now routes that package through bun run build.ts, so I resolved the branch conflict to keep that cleaner path and avoid carrying an extra redundant tsup package dependency.

Local validation on this head:

bun install --frozen-lockfile
bun run --cwd plugins/plugin-local-inference build
bun run --cwd plugins/plugin-local-inference format:check
git diff --check

Results: frozen install passed, local-inference build passed via build.ts, local-inference format check passed over 339 files, and whitespace check passed. I also ran bun run build:core on the immediately preceding merge head and it passed 38/38 tasks including local-inference. CI is rerunning on the new pushed head now.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26131514370, lifeops-multi-tier-frontier-26131514370

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26131514374

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26131514374 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26131514374

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.7985

Full artifacts: see the lifeops-run-hermes-26131514374 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26131832038, lifeops-multi-tier-frontier-26131832038

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26131831984

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26131831984 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26131831984

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.8346

Full artifacts: see the lifeops-run-hermes-26131831984 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOpsBench (Python) — smoke

Run ID: 26131832087
Result file: lifeops_gpt-oss-120b_20260519_235506.json

metric value
pass@1 0.000
pass@k 0.000
agent_cost_usd $0.0000
eval_cost_usd $0.0000
total_cost_usd $0.0000
total_latency_ms 0
scenarios_run 492
scenarios_skipped (cost / timeout) 0

Full artifacts: lifeops-smoke-26131832087 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26133276380, lifeops-multi-tier-frontier-26133276380

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26133276413

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.200
pass@k: 0.200
Total cost: $0.9491

Full artifacts: see the lifeops-run-hermes-26133276413 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26133276413

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26133276413 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

large

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

frontier

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26133481047, lifeops-multi-tier-frontier-26133481047

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — eliza

Run ID: lifeops-eliza-26133480970

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26133480970 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOps Benchmark — hermes

Run ID: lifeops-hermes-26133480970

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.200
pass@k: 0.200
Total cost: $0.9158

Full artifacts: see the lifeops-run-hermes-26133480970 upload on this run.

@github-actions
Copy link
Copy Markdown
Contributor

LifeOpsBench (Python) — smoke

Run ID: 26133481046
Result file: lifeops_gpt-oss-120b_20260520_003829.json

metric value
pass@1 0.000
pass@k 0.000
agent_cost_usd $0.0000
eval_cost_usd $0.0000
total_cost_usd $0.0000
total_latency_ms 0
scenarios_run 492
scenarios_skipped (cost / timeout) 0

Full artifacts: lifeops-smoke-26133481046 upload on this run.

@lalalune lalalune merged commit 5b27001 into develop May 20, 2026
45 of 47 checks passed
@lalalune lalalune deleted the nubs/elizaos-live-prod-hardening-20260519 branch May 20, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants