test(ui): deterministic launcher drag-reorder e2e — poll-and-settle protocol (was 1/5 flaky) by lalalune · Pull Request #11115 · elizaOS/eliza

lalalune · 2026-07-02T03:14:50Z

Follow-up to #11103 (merged with the flaky assertion): the launcher drag-reorder e2e's "order actually changed" assertion failed ~4/5 locally. Root cause: Reorder.Group axis="y" on a multi-column CSS grid — a whole row shares one y-center, so a fixed-endpoint fling thrashes onReorder crossing row boundaries and the release-time order is animation-timing-dependent.

Fix: deterministic drag protocol in run-launcher-e2e.mjs — engage drag, nudge down in bounded 12px steps (60ms dwell), poll live DOM order per step, on first observed swap hold pointer still 300ms and re-confirm stability before mouse.up, then 600ms settle before reading final order. All original assertions kept (telemetry, order-change, persistence, no-dup-ids).

Evidence: 10/10 consecutive green (was 1/5); regenerated checked-in e2e artifacts from the green runs. Both CI-gated runners (test:home-screen-e2e, test:chat-sheet-e2e) pass; ui unit suites 1067 passed; tsgo clean. Real-LLM trajectory: N/A — test-harness-only change. Backend logs: N/A — no server path.

🤖 Generated with Claude Code

The shell fuzz's own header declared converse (VAD/semantic end-of-turn commit→send) a deferred follow-up dimension — a residual against the no-residuals standard. Added it: - the interleaved fuzz now drives converse capture (a complete final routes through the REAL TurnAggregator → synchronous commit → VOICE_DM send) alongside dictation, and asserts lastTurnVoice is cleared after every new-chat (invariant (d)); - a dedicated test proves a complete converse final sends a VOICE_DM (not a plain DM), sets lastTurnVoice, and a new-chat mid-converse clears the flag without orphaning the capture; plus a negative — pure disfluency commits but the respond-gate drops it, so nothing sends. lastTurnVoice is internal (not on the public controller return), so it's observed through its real consumer boundary (the useShellVoiceOutput arg), not by exposing new public state. Also corrected the header's mock disclosure: sendChatText is stubbed here and the send-QUEUE race is pinned separately in useChatSend.send-voice-newchat.race — this suite proves the controller lifecycle, not that leaf. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The #10694 deliverable is "undo + redo, bounded, persisted", but the redo stack was in-memory only. Persist it symmetrically with the undo history (same bound + data-URL quota cap) via loadBackgroundRedo/saveBackgroundRedo, so "step forward" survives a reload just like "step back" does. New test: edit→edit→undo, remount (reload), redo restores the undone config. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nes (#10726) voice-realaudio.spec asserted asr.detail.wer <= 0.34 against a Chromium page.route ASR mock that echoes the expected phrase verbatim — WER is structurally 0, so the assertion could never catch a regression (a real-accuracy claim made against a mock standing in for the thing under test). Removed it; the load-bearing proof in this lane stays (a real captured WAV reached ASR + the stage passed). Documented in voice-selftest that its transcript-content check proves pipeline PROPAGATION, not accuracy. WER accuracy is scored only in the real-recognizer tiers (plugin-local-inference *.real.test.ts + voice:matrix hardware lanes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e2e mirror-larp, #10694) The background e2e fixture hand-mirrored useDisplayPreferences' set/undo/redo push-pop semantics, so mirror-vs-real drift was invisible (audit larp finding). Extracted the semantics into a pure, persistence-free module (state/background-history.ts: applyBackgroundSet/Undo/Redo + MAX) that BOTH the real store (useDisplayPreferences) and the browser e2e fixture now call — one implementation, no drift, and it stays browser-safe for esbuild (no persistence import graph). Added a direct reducer unit test (set/undo/redo, no-op identity, redo-cleared-by-edit, empty-stack no-ops, bound). MAX_BACKGROUND_HISTORY now lives in the reducer module (re-exported from persistence for existing sites). ui typecheck clean; background history/persistence 29/29; background integration e2e green (regenerated screenshots + walkthrough.webm). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… + fix a rebase-orphaned swipeRight - #10706: added a REAL CDP-touch pull-DOWN on home-notification-pull-zone that opens the NotificationCenter sheet (asserts closed→open→closed), and re-settles home before the rail swipe. Previously only jsdom synthetic pointer events covered the pull-down. - Fixed a rebase artifact: the inner-pager mouse-drag test (develop #11065) called a `swipeRight(locator)` helper that no longer had a definition after the rebase onto develop — only `swipeLeft` survived. Added the mirrored `swipeRight` so the runner (and its CI lane) stops crashing with ReferenceError. Full home-screen e2e green (7 screenshots). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The backdrop-blur gate bans blur but not a background fill, so a re-added bg-black*/bg-white/10 on the floating transcript bubbles would slip past it (audit gap). Added a computed-style assertion in the chat-sheet e2e: with the populated thread MAXIMIZED, every message bubble's computed backgroundColor must be transparent (implementation-agnostic — catches a fill re-added by any class, not just a known class name). 12 bubbles asserted; full chat-sheet e2e green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…the mock gesture tests (#10722) The audit flagged Launcher.gestures.test.tsx and use-pull-gesture.test.ts as gesture-pipeline larp — they mock motion/react and fabricate PointerEvents, so they cannot catch drag/reorder/pointer-capture breakage yet presented as gesture coverage. - Real coverage: extended run-launcher-e2e.mjs with a GENUINE pointer drag on a Framer Reorder.Item (in edit mode) and assert it fires `reorder` telemetry, actually changes the tile order, PERSISTS the new order to LAUNCHER_STORAGE_KEY, and drops/duplicates no ids. Verified live: real drag 0→23 reorder events, order views→activity, 25 unique persisted ids. - Honest labels: Launcher.gestures.test is now explicitly the onReorder/onDragEnd BRIDGE-LOGIC suite (what the Launcher does with a gesture result), and use-pull-gesture.test is explicitly LOGIC-ONLY (pure resolvePull/resolveSwipe + the rAF-coalescing #9141 contract) — both point at the real CDP-touch runners for the actual pointer pipeline. No more overstating. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…10722), live-e2e chrome path - #10713: the per-message COPY test asserted only the "Copied" affordance; now it reads navigator.clipboard.readText() back and asserts it equals the assistant text (the context already grants clipboard-read) — proving bytes reached the clipboard, not just that a label flipped. - #10722 WebKit: added an opt-in (PLAYWRIGHT_WEBKIT=1) WebKit/Safari-engine lane to the ui-smoke config, scoped to the keyless, permission-free pointer/focus/ text-input specs (chat-overlay-controls-interactions, conversation-management, slash-commands) so iOS/Safari pointer regressions are catchable; gated so a machine without the WebKit browser download never reds the default lane. - CI: the nightly app-real-e2e ubuntu job set ELIZA_LIVE_TEST=1 but never ELIZA_CHROME_PATH, so the live streaming suite self-skipped forever. Resolve the chromium the job already installs via playwright-core and export ELIZA_CHROME_PATH (with a test -x guard so a mismatch fails loudly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…over both dialects (#10722) The static view-capability audit was vacuous: `isReachable` passed if a view had any VIEW_ACTION_MAP entry (every audited view does → the assertion was unconditionally true), and the DOM-only regex was blind to the 8 spatial views (documents/inbox/goals/health/finances/relationships/todos/focus) that instrument via a spatial `agent=` prop → `data-agent-id`, not `useAgentElement`, so they passed as 0-control "cosmetic" for free (documents actually has 8 registrations the old grep counted as 0). Replaced it with a proportional DENSITY gate: a control-bearing view must register >= ceil(controls / 4) agent-addressable elements, counting controls + registrations across BOTH dialects (DOM handlers/buttons + spatial agent= props). Cap calibrated against the densest real view (orchestrator ~2.7 controls/reg) for ~1.5x headroom (no false fails) while still failing an under-instrumented view. Added a teeth/positive-control test (an 8-control/1-registration source FAILS — the exact case the old check let through) and honest describe/header labels stating it proves static registration density, not runtime hittability. Render- based coverage stays with the running-shell crawler (scripts/view-audit) — see the agent's rationale: @elizaos/agent must not import 14 leaf view packages (dependency inversion) and a bare jsdom render would false-fail. 20/20 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… deterministic BACKGROUND scenario (#10722, #10694) Completes the follow-up CI wiring so the new lanes actually run, and lands the deferred deterministic BACKGROUND scenario: - test.yml: install WebKit and run the opt-in `webkit` project over the keyless chat pointer/focus/composer specs (PLAYWRIGHT_WEBKIT=1) — without this step the WebKit lane never ran anywhere. - ui-e2e-gate.yml: gate the launcher real-drag reorder e2e (test:launcher-e2e), add the components/pages/** path trigger, and upload its output-launcher artifacts. - deterministic-background-actions.scenario.ts: the pr-deterministic lane coverage of the REAL plugin-app-control BACKGROUND handler — named-color + hex set, GLSL shader preset (text + explicit `preset`), a live-shader uniform tweak, undo, redo, reset — asserting the exact ordered `background:apply` broadcast ledger. Verified green locally (86ms). README updated. - background-set-color / background-shader-undo-redo (plugin-app-control, lane:"live-only"): NL→BACKGROUND routing variants for the live lane, matching the existing app-control live-scenario convention (excluded from PR CI; need the designated live model — gpt-oss-120b under-routes them, same as the sibling app-list live scenario). - run-chat-sheet-e2e: strengthen the #10698 no-fill gate to walk the WHOLE per-message wrapper chain (not just the immediate parent), so a fill re-added at any wrapper level is caught. Verified: 24 wrapper entries, all transparent. - Launcher.gestures.test comment: correct the runner filename (run-launcher-e2e.mjs section 2b, gated in ui-e2e-gate.yml). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…g-reorder e2e assertion The fixed-endpoint fling on the Reorder.Item was nondeterministic: axis="y" over a multi-column grid means a whole row shares one y-centre, so onReorder thrashes while the tile crosses a row boundary and the net order can round-trip back to the original by release (reproduced 4/5 locally). Replaced with a deterministic protocol: nudge down in bounded 12px steps, poll the live DOM order each step, and release only after the swap is re-confirmed stable with the pointer held still. 10/10 green after. The develop merge needed no manual resolution: #11105's copy of run-home-screen-e2e.mjs is byte-identical to this branch's (one swipeRight, one touchSwipeDown, pull-down assertion intact). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

coderabbitai · 2026-07-02T03:15:05Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 206a19a8-0015-4b67-b8c5-bc61d3536683

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shaw/fervent-knuth-55d14b

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

greptile-apps

Your trial has ended. Reactivate Greptile to resume code reviews.

claude · 2026-07-02T11:18:46Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Shaw and others added 12 commits July 1, 2026 22:47

Merge remote-tracking branch 'origin/develop' into HEAD

c64786e

greptile-apps Bot reviewed Jul 2, 2026

View reviewed changes

lalalune merged commit e864a52 into develop Jul 2, 2026
34 of 59 checks passed

lalalune deleted the shaw/fervent-knuth-55d14b branch July 2, 2026 03:15

lalalune mentioned this pull request Jul 2, 2026

Interaction QA is shallow/larpy — de-larp touch, mouse, XR & TUI testing with real-input e2e, recordings, per-step validation & fuzz #10722

Closed

github-actions Bot added the ui label Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(ui): deterministic launcher drag-reorder e2e — poll-and-settle protocol (was 1/5 flaky)#11115

test(ui): deterministic launcher drag-reorder e2e — poll-and-settle protocol (was 1/5 flaky)#11115
lalalune merged 12 commits into
developfrom
shaw/fervent-knuth-55d14b

lalalune commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review skipped

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

claude Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lalalune commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review skipped

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jul 2, 2026 •

edited

Loading