fix(os): polish elizaOS live demo by NubsCarson · Pull Request #7803 · elizaOS/eliza

NubsCarson · 2026-05-19T12:58:27Z

Summary

This keeps the elizaOS Live / USB-demo branch current with develop and hardens the USB installer, live demo docs, runtime packaging validation, and OS demo surface.

Polishes the elizaOS Live visual/app path while preserving the normal GNOME/Tails desktop stack.
Keeps the bundled elizaOS/Milady app as the auto-starting home surface.
Hardens USB installer planning/execution with backend-owned planId, exact local-origin backend handling, destructive-write gating, plan expiry, live/root USB refusal, device revalidation, and explicit target confirmation.
Adds safe USB installer proof coverage: fake-media write flow, Playwright desktop/mobile wizard smoke, and a Linux scsi_debug virtual block-device write/readback test.
Adds CI coverage for the USB browser E2E path and conditionally runs the Linux virtual block-device proof when the runner kernel provides scsi_debug.
Keeps the PR branch merged with current origin/develop using normal merges, not force pushes.

Current Source Base

Updated 2026-05-20.

PR head: 5de7b20881b9731405a80d33e04866d3f2541373
Latest merged origin/develop: c73f1768b68ea72b5df83efeeaadea49f812555f
Branch was pushed normally; no force-push was used.
GitHub reports the PR is mergeable; remaining status is check-dependent.

Latest Local Validation

Ran on the merged PR head before pushing the latest commits.

bun run verify:cloud
bun run test:cloud
bun run --cwd packages/os/usb-installer test
bun run --cwd packages/os/usb-installer typecheck
bun run --cwd packages/os/usb-installer lint
bun run --cwd packages/os/usb-installer build
bun run --cwd packages/os/usb-installer test:e2e
bun run --cwd packages/os/usb-installer test:linux-virtual-usb
bun run --cwd packages/os/usb-installer test -- src/__tests__/linux-virtual-block-device-e2e.test.ts
git diff --check

Results:

Cloud verify passed.
Cloud package-wide unit tests passed: 266 tests across 28 files.
USB installer unit/integration suite passed: 9 files, 80 tests, 1 opt-in virtual-device skip.
USB installer typecheck, build, and expanded Biome lint passed.
Playwright e2e passed: 6 tests across desktop/mobile render and mocked guarded wizard flow.
Linux virtual USB proof passed locally with ELIZAOS_USB_TEST_SCSI_DEBUG=1: disposable scsi_debug block device, real lsblk, sudo -n dd, sync, readback SHA-256 match, cleanup verified with scsi_debug unloaded.
The same virtual test is skipped by default without the opt-in env, matching CI behavior on runners without scsi_debug.
git diff --check passed.

Current GitHub Status

PR remains draft while GitHub checks rerun on 5de7b20881.
The previous cloud CI failures were package-wide Bun mock leakage; this branch now isolates those mocks and validates verify:cloud + test:cloud locally.
The earlier OS release CI failure was GitHub's Azure kernel missing scsi_debug; CI now runs the proof only when the module exists and emits a notice otherwise.
The earlier merge conflict was resolved by merging origin/develop@c73f1768b6 and taking the newer upstream GitHub live-artifact validator workflow shape.
Skipped fork/security/manual gates may still appear as expected skips.

Remaining Hardware/Product Gaps

Before making hardware/product claims, still needed:

Repeat guarded physical USB flash/readback for a final ISO built from the current head.
Boot that USB on real hardware.
Validate real USB Persistent Storage create/unlock/delete behavior.
Validate privacy/direct networking behavior for app, renderer, embedded browser, OAuth, and external web surfaces.
Production release still needs signed image manifests, signed privileged helpers, real updater/rollback infrastructure, SBOM/provenance, recovery policy, and formal inherited Tails sudoers review.

coderabbitai · 2026-05-19T12:58:47Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f2c31ae-cef5-4329-a151-a1a1e12d4ace

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch nubs/elizaos-live-prod-hardening-20260519

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codefactor-io · 2026-05-19T12:59:34Z

+    sourcePath && fs.existsSync(sourcePath)
+      ? fs.readFileSync(sourcePath, "utf8")
+      : fs.readFileSync(filePath, "utf8").replace(
+          /export \{\n  handleWalletRoutes,\n  type WalletAddressesSnapshot,\n  type WalletRouteContext,\n  type WalletRouteDependencies,\n  type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,


Spaces are hard to count. Use {2}.

Suggested change

/export \{\n handleWalletRoutes,\n type WalletAddressesSnapshot,\n type WalletRouteContext,\n type WalletRouteDependencies,\n type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,

/export \{\n {2}handleWalletRoutes,\n {2}type WalletAddressesSnapshot,\n {2}type WalletRouteContext,\n {2}type WalletRouteDependencies,\n {2}type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,

github-actions · 2026-05-19T15:59:22Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26108946361, lifeops-multi-tier-frontier-26108946361

github-actions · 2026-05-19T15:59:24Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26108946520

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26108946520 upload on this run.

github-actions · 2026-05-19T16:00:02Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26108946520

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.320
pass@k: 0.320
Total cost: $0.8595

Full artifacts: see the lifeops-run-hermes-26108946520 upload on this run.

github-actions · 2026-05-19T16:50:09Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26111733572

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26111733572 upload on this run.

github-actions · 2026-05-19T16:50:49Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26111733572

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.320
pass@k: 0.320
Total cost: $0.7427

Full artifacts: see the lifeops-run-hermes-26111733572 upload on this run.

github-actions · 2026-05-19T16:52:11Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large` — cancelled

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26111733575, lifeops-multi-tier-frontier-26111733575

github-actions · 2026-05-19T16:53:59Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26111935181

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26111935181 upload on this run.

github-actions · 2026-05-19T16:55:31Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26111935148, lifeops-multi-tier-frontier-26111935148

github-actions · 2026-05-19T16:55:36Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26111935181

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.7702

Full artifacts: see the lifeops-run-hermes-26111935181 upload on this run.

github-actions · 2026-05-19T17:05:36Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large` — failure

`frontier` — failure

Artifacts: lifeops-multi-tier-large-26112591664, lifeops-multi-tier-frontier-26112591664

github-actions · 2026-05-19T17:15:04Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26113068019

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26113068019 upload on this run.

github-actions · 2026-05-19T17:15:46Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26113067938, lifeops-multi-tier-frontier-26113067938

github-actions · 2026-05-19T17:17:02Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26113068019

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.240
pass@k: 0.240
Total cost: $0.8883

Full artifacts: see the lifeops-run-hermes-26113068019 upload on this run.

github-actions · 2026-05-19T17:19:58Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26113303728, lifeops-multi-tier-frontier-26113303728

github-actions · 2026-05-19T17:20:03Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26113304058

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26113304058 upload on this run.

github-actions · 2026-05-19T17:20:49Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26113304058

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.360
pass@k: 0.360
Total cost: $0.7672

Full artifacts: see the lifeops-run-hermes-26113304058 upload on this run.

github-actions · 2026-05-19T17:24:14Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26113537126

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26113537126 upload on this run.

github-actions · 2026-05-19T17:24:18Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26113537111, lifeops-multi-tier-frontier-26113537111

…prod-hardening-20260519

…-20260519' into nubs/elizaos-live-prod-hardening-20260519 # Conflicts: # bun.lock # plugins/plugin-local-inference/package.json

github-actions · 2026-05-19T23:25:08Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26131315370

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26131315370 upload on this run.

…prod-hardening-20260519

github-actions · 2026-05-19T23:25:48Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26131315296, lifeops-multi-tier-frontier-26131315296

github-actions · 2026-05-19T23:26:50Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26131315370

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.8431

Full artifacts: see the lifeops-run-hermes-26131315370 upload on this run.

NubsCarson · 2026-05-19T23:28:28Z

Latest refresh pushed as d3eb80c11ebeb72739a9ad59cada31b8a682ed9f.

I merged current origin/develop through d32b6a446185e77267f1e51005a1796de244b43b using normal merges. The earlier CI root cause was @elizaos/plugin-local-inference failing with tsup: command not found; latest upstream now routes that package through bun run build.ts, so I resolved the branch conflict to keep that cleaner path and avoid carrying an extra redundant tsup package dependency.

Local validation on this head:

bun install --frozen-lockfile
bun run --cwd plugins/plugin-local-inference build
bun run --cwd plugins/plugin-local-inference format:check
git diff --check

Results: frozen install passed, local-inference build passed via build.ts, local-inference format check passed over 339 files, and whitespace check passed. I also ran bun run build:core on the immediately preceding merge head and it passed 38/38 tasks including local-inference. CI is rerunning on the new pushed head now.

github-actions · 2026-05-19T23:29:50Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26131514370, lifeops-multi-tier-frontier-26131514370

github-actions · 2026-05-19T23:29:54Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26131514374

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26131514374 upload on this run.

github-actions · 2026-05-19T23:30:55Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26131514374

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.7985

Full artifacts: see the lifeops-run-hermes-26131514374 upload on this run.

github-actions · 2026-05-19T23:38:18Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26131832038, lifeops-multi-tier-frontier-26131832038

github-actions · 2026-05-19T23:38:28Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26131831984

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26131831984 upload on this run.

github-actions · 2026-05-19T23:39:16Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26131831984

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.280
pass@k: 0.280
Total cost: $0.8346

Full artifacts: see the lifeops-run-hermes-26131831984 upload on this run.

github-actions · 2026-05-19T23:55:07Z

LifeOpsBench (Python) — smoke

Run ID: 26131832087
Result file: lifeops_gpt-oss-120b_20260519_235506.json

metric	value
pass@1	0.000
pass@k	0.000
agent_cost_usd	$0.0000
eval_cost_usd	$0.0000
total_cost_usd	$0.0000
total_latency_ms	0
scenarios_run	492
scenarios_skipped (cost / timeout)	0

Full artifacts: lifeops-smoke-26131832087 upload on this run.

# Conflicts: # .github/workflows/test.yml

github-actions · 2026-05-20T00:18:54Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26133276380, lifeops-multi-tier-frontier-26133276380

github-actions · 2026-05-20T00:20:31Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26133276413

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.200
pass@k: 0.200
Total cost: $0.9491

Full artifacts: see the lifeops-run-hermes-26133276413 upload on this run.

github-actions · 2026-05-20T00:20:47Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26133276413

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26133276413 upload on this run.

github-actions · 2026-05-20T00:24:46Z

LifeOps Multi-Tier Benchmark

Suite: smoke — Tiers requested: large,frontier

`large`

LifeOps Multi-Tier Benchmark

Tier: large
Suite: smoke

`frontier`

LifeOps Multi-Tier Benchmark

Tier: frontier
Suite: smoke

Artifacts: lifeops-multi-tier-large-26133481047, lifeops-multi-tier-frontier-26133481047

github-actions · 2026-05-20T00:25:19Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26133480970

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26133480970 upload on this run.

github-actions · 2026-05-20T00:26:26Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26133480970

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.200
pass@k: 0.200
Total cost: $0.9158

Full artifacts: see the lifeops-run-hermes-26133480970 upload on this run.

github-actions · 2026-05-20T00:40:32Z

LifeOpsBench (Python) — smoke

Run ID: 26133481046
Result file: lifeops_gpt-oss-120b_20260520_003829.json

metric	value
pass@1	0.000
pass@k	0.000
agent_cost_usd	$0.0000
eval_cost_usd	$0.0000
total_cost_usd	$0.0000
total_latency_ms	0
scenarios_run	492
scenarios_skipped (cost / timeout)	0

Full artifacts: lifeops-smoke-26133481046 upload on this run.

github-actions Bot added ui build Docs Tests core labels May 19, 2026

codefactor-io Bot reviewed May 19, 2026

View reviewed changes

NubsCarson force-pushed the nubs/elizaos-live-prod-hardening-20260519 branch from bf275a2 to 8287f9a Compare May 19, 2026 13:04

github-actions Bot added the plugins label May 19, 2026

NubsCarson force-pushed the nubs/elizaos-live-prod-hardening-20260519 branch 2 times, most recently from 3f2c48b to e68c797 Compare May 19, 2026 15:44

NubsCarson force-pushed the nubs/elizaos-live-prod-hardening-20260519 branch from 6ad2187 to 80d16c3 Compare May 19, 2026 17:03

NubsCarson added 3 commits May 19, 2026 23:21

fix(local-inference): declare tsup build dependency

9a24677

Merge remote-tracking branch 'origin/develop' into nubs/elizaos-live-…

aba194c

…prod-hardening-20260519

Merge remote-tracking branch 'origin/nubs/elizaos-live-prod-hardening…

ccb4499

…-20260519' into nubs/elizaos-live-prod-hardening-20260519 # Conflicts: # bun.lock # plugins/plugin-local-inference/package.json

Merge remote-tracking branch 'origin/develop' into nubs/elizaos-live-…

d3eb80c

…prod-hardening-20260519

docs(os): refresh usb installer proof handoff

b880e3b

NubsCarson added 3 commits May 19, 2026 23:57

fix(cloud): isolate package-wide unit mocks

e8158a5

fix(os): harden usb installer write flow

5f0880a

Merge remote-tracking branch 'origin/develop' into HEAD

ad6d6f2

# Conflicts: # .github/workflows/test.yml

fix(os): make virtual usb proof conditional in ci

5de7b20

lalalune merged commit 5b27001 into develop May 20, 2026
45 of 47 checks passed

lalalune deleted the nubs/elizaos-live-prod-hardening-20260519 branch May 20, 2026 00:59

NubsCarson mentioned this pull request May 20, 2026

fix(test): stabilize cloud mock e2e stack #7824

Merged

	/export \{\n handleWalletRoutes,\n type WalletAddressesSnapshot,\n type WalletRouteContext,\n type WalletRouteDependencies,\n type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,
	/export \{\n {2}handleWalletRoutes,\n {2}type WalletAddressesSnapshot,\n {2}type WalletRouteContext,\n {2}type WalletRouteDependencies,\n {2}type WalletRpcReadinessSnapshot,\n\} from "@elizaos\/plugin-wallet";/,

Conversation

NubsCarson commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Current Source Base

Latest Local Validation

Current GitHub Status

Remaining Hardware/Product Gaps

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codefactor-io Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Multi-Tier Benchmark

large

LifeOps Multi-Tier Benchmark

frontier

LifeOps Multi-Tier Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — eliza

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — hermes

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — eliza

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — hermes

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Multi-Tier Benchmark

large — cancelled

frontier

LifeOps Multi-Tier Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — eliza

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Multi-Tier Benchmark

large

LifeOps Multi-Tier Benchmark

frontier

LifeOps Multi-Tier Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — hermes

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Multi-Tier Benchmark

large — failure

frontier — failure

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — eliza

LifeOps Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Multi-Tier Benchmark

large

LifeOps Multi-Tier Benchmark

frontier

LifeOps Multi-Tier Benchmark

Uh oh!

github-actions Bot commented May 19, 2026

LifeOps Benchmark — hermes

LifeOps Benchmark

Uh oh!

NubsCarson commented May 19, 2026 •

edited

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

`large`

`frontier`

LifeOps Benchmark — `eliza`

LifeOps Benchmark — `hermes`

LifeOps Benchmark — `eliza`

LifeOps Benchmark — `hermes`

`large` — cancelled

`frontier`

LifeOps Benchmark — `eliza`

`large`

`frontier`

LifeOps Benchmark — `hermes`

`large` — failure

`frontier` — failure

LifeOps Benchmark — `eliza`

`large`

`frontier`

LifeOps Benchmark — `hermes`

`large`

`frontier`

LifeOps Benchmark — `eliza`

LifeOps Benchmark — `hermes`

LifeOps Benchmark — `eliza`

`large`

`frontier`

LifeOps Benchmark — `eliza`

`large`

`frontier`

LifeOps Benchmark — `hermes`

`large`

`frontier`

LifeOps Benchmark — `eliza`

LifeOps Benchmark — `hermes`

`large`

`frontier`