Skip to content

test(ui): deterministic launcher drag-reorder e2e — poll-and-settle protocol (was 1/5 flaky)#11115

Merged
lalalune merged 12 commits into
developfrom
shaw/fervent-knuth-55d14b
Jul 2, 2026
Merged

test(ui): deterministic launcher drag-reorder e2e — poll-and-settle protocol (was 1/5 flaky)#11115
lalalune merged 12 commits into
developfrom
shaw/fervent-knuth-55d14b

Conversation

@lalalune

@lalalune lalalune commented Jul 2, 2026

Copy link
Copy Markdown
Member

Follow-up to #11103 (merged with the flaky assertion): the launcher drag-reorder e2e's "order actually changed" assertion failed ~4/5 locally. Root cause: Reorder.Group axis="y" on a multi-column CSS grid — a whole row shares one y-center, so a fixed-endpoint fling thrashes onReorder crossing row boundaries and the release-time order is animation-timing-dependent.

Fix: deterministic drag protocol in run-launcher-e2e.mjs — engage drag, nudge down in bounded 12px steps (60ms dwell), poll live DOM order per step, on first observed swap hold pointer still 300ms and re-confirm stability before mouse.up, then 600ms settle before reading final order. All original assertions kept (telemetry, order-change, persistence, no-dup-ids).

Evidence: 10/10 consecutive green (was 1/5); regenerated checked-in e2e artifacts from the green runs. Both CI-gated runners (test:home-screen-e2e, test:chat-sheet-e2e) pass; ui unit suites 1067 passed; tsgo clean. Real-LLM trajectory: N/A — test-harness-only change. Backend logs: N/A — no server path.

🤖 Generated with Claude Code

Shaw and others added 12 commits July 1, 2026 22:47
The shell fuzz's own header declared converse (VAD/semantic end-of-turn
commit→send) a deferred follow-up dimension — a residual against the no-residuals
standard. Added it:
- the interleaved fuzz now drives converse capture (a complete final routes
  through the REAL TurnAggregator → synchronous commit → VOICE_DM send) alongside
  dictation, and asserts lastTurnVoice is cleared after every new-chat (invariant
  (d));
- a dedicated test proves a complete converse final sends a VOICE_DM (not a plain
  DM), sets lastTurnVoice, and a new-chat mid-converse clears the flag without
  orphaning the capture; plus a negative — pure disfluency commits but the
  respond-gate drops it, so nothing sends.
lastTurnVoice is internal (not on the public controller return), so it's observed
through its real consumer boundary (the useShellVoiceOutput arg), not by exposing
new public state. Also corrected the header's mock disclosure: sendChatText is
stubbed here and the send-QUEUE race is pinned separately in
useChatSend.send-voice-newchat.race — this suite proves the controller lifecycle,
not that leaf. 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The #10694 deliverable is "undo + redo, bounded, persisted", but the redo stack
was in-memory only. Persist it symmetrically with the undo history (same
bound + data-URL quota cap) via loadBackgroundRedo/saveBackgroundRedo, so "step
forward" survives a reload just like "step back" does. New test: edit→edit→undo,
remount (reload), redo restores the undone config.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nes (#10726)

voice-realaudio.spec asserted asr.detail.wer <= 0.34 against a Chromium page.route
ASR mock that echoes the expected phrase verbatim — WER is structurally 0, so the
assertion could never catch a regression (a real-accuracy claim made against a
mock standing in for the thing under test). Removed it; the load-bearing proof in
this lane stays (a real captured WAV reached ASR + the stage passed). Documented
in voice-selftest that its transcript-content check proves pipeline PROPAGATION,
not accuracy. WER accuracy is scored only in the real-recognizer tiers
(plugin-local-inference *.real.test.ts + voice:matrix hardware lanes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e2e mirror-larp, #10694)

The background e2e fixture hand-mirrored useDisplayPreferences' set/undo/redo
push-pop semantics, so mirror-vs-real drift was invisible (audit larp finding).
Extracted the semantics into a pure, persistence-free module
(state/background-history.ts: applyBackgroundSet/Undo/Redo + MAX) that BOTH the
real store (useDisplayPreferences) and the browser e2e fixture now call — one
implementation, no drift, and it stays browser-safe for esbuild (no persistence
import graph). Added a direct reducer unit test (set/undo/redo, no-op identity,
redo-cleared-by-edit, empty-stack no-ops, bound). MAX_BACKGROUND_HISTORY now
lives in the reducer module (re-exported from persistence for existing sites).
ui typecheck clean; background history/persistence 29/29; background integration
e2e green (regenerated screenshots + walkthrough.webm).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + fix a rebase-orphaned swipeRight

- #10706: added a REAL CDP-touch pull-DOWN on home-notification-pull-zone that
  opens the NotificationCenter sheet (asserts closed→open→closed), and re-settles
  home before the rail swipe. Previously only jsdom synthetic pointer events
  covered the pull-down.
- Fixed a rebase artifact: the inner-pager mouse-drag test (develop #11065) called
  a `swipeRight(locator)` helper that no longer had a definition after the rebase
  onto develop — only `swipeLeft` survived. Added the mirrored `swipeRight` so the
  runner (and its CI lane) stops crashing with ReferenceError. Full home-screen
  e2e green (7 screenshots).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The backdrop-blur gate bans blur but not a background fill, so a re-added
bg-black*/bg-white/10 on the floating transcript bubbles would slip past it
(audit gap). Added a computed-style assertion in the chat-sheet e2e: with the
populated thread MAXIMIZED, every message bubble's computed backgroundColor must
be transparent (implementation-agnostic — catches a fill re-added by any class,
not just a known class name). 12 bubbles asserted; full chat-sheet e2e green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…the mock gesture tests (#10722)

The audit flagged Launcher.gestures.test.tsx and use-pull-gesture.test.ts as
gesture-pipeline larp — they mock motion/react and fabricate PointerEvents, so
they cannot catch drag/reorder/pointer-capture breakage yet presented as gesture
coverage.

- Real coverage: extended run-launcher-e2e.mjs with a GENUINE pointer drag on a
  Framer Reorder.Item (in edit mode) and assert it fires `reorder` telemetry,
  actually changes the tile order, PERSISTS the new order to
  LAUNCHER_STORAGE_KEY, and drops/duplicates no ids. Verified live: real drag
  0→23 reorder events, order views→activity, 25 unique persisted ids.
- Honest labels: Launcher.gestures.test is now explicitly the onReorder/onDragEnd
  BRIDGE-LOGIC suite (what the Launcher does with a gesture result), and
  use-pull-gesture.test is explicitly LOGIC-ONLY (pure resolvePull/resolveSwipe +
  the rAF-coalescing #9141 contract) — both point at the real CDP-touch runners
  for the actual pointer pipeline. No more overstating.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…10722), live-e2e chrome path

- #10713: the per-message COPY test asserted only the "Copied" affordance; now it
  reads navigator.clipboard.readText() back and asserts it equals the assistant
  text (the context already grants clipboard-read) — proving bytes reached the
  clipboard, not just that a label flipped.
- #10722 WebKit: added an opt-in (PLAYWRIGHT_WEBKIT=1) WebKit/Safari-engine lane
  to the ui-smoke config, scoped to the keyless, permission-free pointer/focus/
  text-input specs (chat-overlay-controls-interactions, conversation-management,
  slash-commands) so iOS/Safari pointer regressions are catchable; gated so a
  machine without the WebKit browser download never reds the default lane.
- CI: the nightly app-real-e2e ubuntu job set ELIZA_LIVE_TEST=1 but never
  ELIZA_CHROME_PATH, so the live streaming suite self-skipped forever. Resolve the
  chromium the job already installs via playwright-core and export
  ELIZA_CHROME_PATH (with a test -x guard so a mismatch fails loudly).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…over both dialects (#10722)

The static view-capability audit was vacuous: `isReachable` passed if a view had
any VIEW_ACTION_MAP entry (every audited view does → the assertion was
unconditionally true), and the DOM-only regex was blind to the 8 spatial views
(documents/inbox/goals/health/finances/relationships/todos/focus) that
instrument via a spatial `agent=` prop → `data-agent-id`, not `useAgentElement`,
so they passed as 0-control "cosmetic" for free (documents actually has 8
registrations the old grep counted as 0).

Replaced it with a proportional DENSITY gate: a control-bearing view must
register >= ceil(controls / 4) agent-addressable elements, counting controls +
registrations across BOTH dialects (DOM handlers/buttons + spatial agent= props).
Cap calibrated against the densest real view (orchestrator ~2.7 controls/reg) for
~1.5x headroom (no false fails) while still failing an under-instrumented view.
Added a teeth/positive-control test (an 8-control/1-registration source FAILS —
the exact case the old check let through) and honest describe/header labels
stating it proves static registration density, not runtime hittability. Render-
based coverage stays with the running-shell crawler (scripts/view-audit) — see
the agent's rationale: @elizaos/agent must not import 14 leaf view packages
(dependency inversion) and a bare jsdom render would false-fail. 20/20 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… deterministic BACKGROUND scenario (#10722, #10694)

Completes the follow-up CI wiring so the new lanes actually run, and lands the
deferred deterministic BACKGROUND scenario:

- test.yml: install WebKit and run the opt-in `webkit` project over the keyless
  chat pointer/focus/composer specs (PLAYWRIGHT_WEBKIT=1) — without this step the
  WebKit lane never ran anywhere.
- ui-e2e-gate.yml: gate the launcher real-drag reorder e2e (test:launcher-e2e),
  add the components/pages/** path trigger, and upload its output-launcher
  artifacts.
- deterministic-background-actions.scenario.ts: the pr-deterministic lane
  coverage of the REAL plugin-app-control BACKGROUND handler — named-color + hex
  set, GLSL shader preset (text + explicit `preset`), a live-shader uniform
  tweak, undo, redo, reset — asserting the exact ordered `background:apply`
  broadcast ledger. Verified green locally (86ms). README updated.
- background-set-color / background-shader-undo-redo (plugin-app-control,
  lane:"live-only"): NL→BACKGROUND routing variants for the live lane, matching
  the existing app-control live-scenario convention (excluded from PR CI; need
  the designated live model — gpt-oss-120b under-routes them, same as the sibling
  app-list live scenario).
- run-chat-sheet-e2e: strengthen the #10698 no-fill gate to walk the WHOLE
  per-message wrapper chain (not just the immediate parent), so a fill re-added at
  any wrapper level is caught. Verified: 24 wrapper entries, all transparent.
- Launcher.gestures.test comment: correct the runner filename
  (run-launcher-e2e.mjs section 2b, gated in ui-e2e-gate.yml).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…g-reorder e2e assertion

The fixed-endpoint fling on the Reorder.Item was nondeterministic: axis="y"
over a multi-column grid means a whole row shares one y-centre, so onReorder
thrashes while the tile crosses a row boundary and the net order can
round-trip back to the original by release (reproduced 4/5 locally).
Replaced with a deterministic protocol: nudge down in bounded 12px steps,
poll the live DOM order each step, and release only after the swap is
re-confirmed stable with the pointer held still. 10/10 green after.

The develop merge needed no manual resolution: #11105's copy of
run-home-screen-e2e.mjs is byte-identical to this branch's (one swipeRight,
one touchSwipeDown, pull-down assertion intact).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 206a19a8-0015-4b67-b8c5-bc61d3536683

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shaw/fervent-knuth-55d14b

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trial has ended. Reactivate Greptile to resume code reviews.

@lalalune lalalune merged commit e864a52 into develop Jul 2, 2026
34 of 59 checks passed
@lalalune lalalune deleted the shaw/fervent-knuth-55d14b branch July 2, 2026 03:15
@github-actions github-actions Bot added the ui label Jul 2, 2026
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude encountered an error —— View job


I'll analyze this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant