fix: harden cloud E2E and elizaOS USB live path#7825
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
24dfb21 to
7f92e47
Compare
|
I'll analyze this and get back to you. |
|
Claude encountered an error after 0s —— View job I'll analyze this and get back to you. |
7f92e47 to
b8d2366
Compare
16158b7 to
b78a85b
Compare
|
Claude encountered an error after 0s —— View job I'll analyze this and get back to you. |
7497cf6 to
75046d5
Compare
|
Updated this PR on top of current
Latest local validation on
I did not use the earlier standalone BlueBubbles route invocation as the gate for this update; the broader |
8639eb7 to
f3aa7bf
Compare
|
Revalidated after the branch was rebased onto latest Current PR head: Latest local validation:
|
f3aa7bf to
12f906e
Compare
12f906e to
af2a1d2
Compare
|
Claude encountered an error after 0s —— View job I'll analyze this and get back to you. |
|
Claude encountered an error after 0s —— View job I'll analyze this and get back to you. |

Summary
container-control-planesidecar instead of the stale in-process control-plane mock.NODE_ENV=test/CLOUD_E2E=1only.duplex: "half"), closes local DB pools before stopping PGlite, and moves best-effort agent API-key revocation outside the sandbox-delete transaction.waitUntilbatches deterministically, destroys already-streaming responses on late errors, and forcibly closes memory-sandbox keep-alive sockets..imgartifacts, refuses direct ISO-to-USB writes by default, keeps the internal Tails GPT partition name needed by Persistent Storage while exposing theELIZAOSfilesystem label, and includessudofor inherited Persistent Storage hooks./api/voice/onboarding/profile/startreturns a partial session without prompts.Root Cause
Mock-stack E2Ewas still wired to stale harness assumptions: wrong repo-root resolution, an in-process control-plane mock, Wrangler-dependent cloud-api startup, and teardown races against live PGlite sockets. After moving to the real sidecar path, deprovision also exposed slow best-effort API-key cleanup inside the sandbox-delete DB transaction.The elizaOS Live USB proof exposed separate release-path issues: raw ISO writes do not create the USB layout Tails expects for persistence, the generated USB image had branded the internal GPT partition name in a way that failed Tails' persistence eligibility guard, inherited Persistent Storage hooks still call
/usr/bin/sudo, and onboarding assumed the voice-profile endpoint always returned a complete prompt list.Validation
Cloud validation from the prior commits on this branch:
bunx @biomejs/biome check packages/cloud-shared/src/lib/services/memory-sandbox-provider.ts packages/scripts/cloud/admin/dev/cloud-api-e2e-server.mjsbun run --cwd packages/cloud-shared typecheckbun run --cwd packages/cloud-api typecheckbun run --cwd packages/test/cloud-e2e typecheckCI=true MOCK_REDIS=1 MOCK_HETZNER_LATENCY=0 MOCK_HETZNER_ACTION_MS=30 CONTROL_PLANE_TICK_MS=50 DATABASE_URL=pglite://./.eliza-ci/.pgdata HCLOUD_TOKEN=test-token CONTAINER_CONTROL_PLANE_TOKEN=test-token CRON_SECRET=test-cron-secret timeout 15m bun run cloud:e2e— 4 passed in 24.2stimeout 8m bun run test:cloud— 279 passed, 0 failedCurrent elizaOS Live/USB/onboarding validation:
bash -n scripts/usb-write.sh tails/auto/scripts/create-usb-image-from-isoELIZAOS_STATIC_SOURCE_ONLY=1 ./scripts/static-smoke.shbunx @biomejs/biome check packages/ui/src/api/client-voice-profiles.ts packages/ui/src/api/client-voice-profiles.test.ts packages/ui/src/components/onboarding/VoicePrefixSteps.tsxbun run --cwd packages/ui test src/api/client-voice-profiles.test.ts— 21 passedscripts/usb-write.shrefuses direct ISO writes before writing.sudogreeter crash on the persistence-enabled reboot path; this PR fixes that in source, but a fresh ISO plus.imgrebuild is still required before calling the current HEAD final USB-ready.git diff --checkRemaining Release Gates
.imgfrom this exact branch head..img.Greptile Summary
This PR hardens the cloud mock-stack E2E harness by replacing the stale in-process control-plane mock with the real
container-control-planesidecar, introducing a guarded in-memory sandbox provider for test environments, and adding a Node-hosted Worker fetch adapter so CI exercises the actual router, DB queue, and sidecar forwarder without Wrangler.stack.ts,cloud-api-e2e-server.mjs,memory-sandbox-provider.ts): swaps in the real control-plane sidecar, adds hop-by-hop header filtering, fixesduplex: "half"for Node body forwarding, deterministically awaitswaitUntilbatches, and destroys streaming responses on late errors.eliza-sandbox.ts,api-keys.ts): moves best-effort API-key revocation outside the delete transaction and combinesfindByName+deleteByNameinto oneRETURNINGdelete call.linux-backend.ts,client-voice-profiles.ts): addsmountinfo-based system-disk detection, and normalises malformed capture-session responses with a fallback instead of passing unvalidated server data to UI state.Confidence Score: 5/5
Safe to merge — all changed paths have been validated locally and the core logic changes are sound.
The deleteByName consolidation in api-keys.ts was verified against the repository implementation, which uses a RETURNING clause so deleted rows are correctly returned for cache invalidation. The sandbox teardown refactor in eliza-sandbox.ts correctly moves best-effort API-key revocation outside the transaction without any data-loss risk. The new E2E server correctly filters hop-by-hop headers, drains waitUntil promises before writing response headers, and handles late errors by destroying the socket. No correctness issues were found.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant TR as Test Runner participant ST as stack.ts fixture participant PG as PGlite TCP Bridge participant CP as container-control-plane participant API as cloud-api Node adapter participant MSP as MemorySandboxProvider TR->>ST: startCloudStack() ST->>PG: spawn pglite-server.ts ST->>ST: spawnSync migrate-with-diagnostics.ts ST->>CP: spawn bun run start ST->>CP: waitForHttpOk /health ST->>API: spawn node --import tsx cloud-api-e2e-server.mjs ST->>API: waitForHttpOk /api/health ST-->>TR: StackHandle TR->>API: POST /api/v1/eliza/agents API->>CP: forward request CP->>MSP: create() MSP-->>CP: SandboxHandle CP-->>API: sandbox created API-->>TR: 200 OK TR->>ST: stop() ST->>API: SIGTERM ST->>CP: SIGTERM ST->>ST: closeDatabaseConnectionsForTests() ST->>PG: SIGTERMReviews (5): Last reviewed commit: "fix(os): harden USB persistence and onbo..." | Re-trigger Greptile