test(ui,app,agent): onboarding/voice/background/interaction QA follow-ups + de-larp (#11083 follow-up)#11103
Conversation
There was a problem hiding this comment.
Your trial has ended. Reactivate Greptile to resume code reviews.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The shell fuzz's own header declared converse (VAD/semantic end-of-turn commit→send) a deferred follow-up dimension — a residual against the no-residuals standard. Added it: - the interleaved fuzz now drives converse capture (a complete final routes through the REAL TurnAggregator → synchronous commit → VOICE_DM send) alongside dictation, and asserts lastTurnVoice is cleared after every new-chat (invariant (d)); - a dedicated test proves a complete converse final sends a VOICE_DM (not a plain DM), sets lastTurnVoice, and a new-chat mid-converse clears the flag without orphaning the capture; plus a negative — pure disfluency commits but the respond-gate drops it, so nothing sends. lastTurnVoice is internal (not on the public controller return), so it's observed through its real consumer boundary (the useShellVoiceOutput arg), not by exposing new public state. Also corrected the header's mock disclosure: sendChatText is stubbed here and the send-QUEUE race is pinned separately in useChatSend.send-voice-newchat.race — this suite proves the controller lifecycle, not that leaf. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The #10694 deliverable is "undo + redo, bounded, persisted", but the redo stack was in-memory only. Persist it symmetrically with the undo history (same bound + data-URL quota cap) via loadBackgroundRedo/saveBackgroundRedo, so "step forward" survives a reload just like "step back" does. New test: edit→edit→undo, remount (reload), redo restores the undone config. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nes (#10726) voice-realaudio.spec asserted asr.detail.wer <= 0.34 against a Chromium page.route ASR mock that echoes the expected phrase verbatim — WER is structurally 0, so the assertion could never catch a regression (a real-accuracy claim made against a mock standing in for the thing under test). Removed it; the load-bearing proof in this lane stays (a real captured WAV reached ASR + the stage passed). Documented in voice-selftest that its transcript-content check proves pipeline PROPAGATION, not accuracy. WER accuracy is scored only in the real-recognizer tiers (plugin-local-inference *.real.test.ts + voice:matrix hardware lanes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e2e mirror-larp, #10694) The background e2e fixture hand-mirrored useDisplayPreferences' set/undo/redo push-pop semantics, so mirror-vs-real drift was invisible (audit larp finding). Extracted the semantics into a pure, persistence-free module (state/background-history.ts: applyBackgroundSet/Undo/Redo + MAX) that BOTH the real store (useDisplayPreferences) and the browser e2e fixture now call — one implementation, no drift, and it stays browser-safe for esbuild (no persistence import graph). Added a direct reducer unit test (set/undo/redo, no-op identity, redo-cleared-by-edit, empty-stack no-ops, bound). MAX_BACKGROUND_HISTORY now lives in the reducer module (re-exported from persistence for existing sites). ui typecheck clean; background history/persistence 29/29; background integration e2e green (regenerated screenshots + walkthrough.webm). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + fix a rebase-orphaned swipeRight - #10706: added a REAL CDP-touch pull-DOWN on home-notification-pull-zone that opens the NotificationCenter sheet (asserts closed→open→closed), and re-settles home before the rail swipe. Previously only jsdom synthetic pointer events covered the pull-down. - Fixed a rebase artifact: the inner-pager mouse-drag test (develop #11065) called a `swipeRight(locator)` helper that no longer had a definition after the rebase onto develop — only `swipeLeft` survived. Added the mirrored `swipeRight` so the runner (and its CI lane) stops crashing with ReferenceError. Full home-screen e2e green (7 screenshots). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The backdrop-blur gate bans blur but not a background fill, so a re-added bg-black*/bg-white/10 on the floating transcript bubbles would slip past it (audit gap). Added a computed-style assertion in the chat-sheet e2e: with the populated thread MAXIMIZED, every message bubble's computed backgroundColor must be transparent (implementation-agnostic — catches a fill re-added by any class, not just a known class name). 12 bubbles asserted; full chat-sheet e2e green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…the mock gesture tests (#10722) The audit flagged Launcher.gestures.test.tsx and use-pull-gesture.test.ts as gesture-pipeline larp — they mock motion/react and fabricate PointerEvents, so they cannot catch drag/reorder/pointer-capture breakage yet presented as gesture coverage. - Real coverage: extended run-launcher-e2e.mjs with a GENUINE pointer drag on a Framer Reorder.Item (in edit mode) and assert it fires `reorder` telemetry, actually changes the tile order, PERSISTS the new order to LAUNCHER_STORAGE_KEY, and drops/duplicates no ids. Verified live: real drag 0→23 reorder events, order views→activity, 25 unique persisted ids. - Honest labels: Launcher.gestures.test is now explicitly the onReorder/onDragEnd BRIDGE-LOGIC suite (what the Launcher does with a gesture result), and use-pull-gesture.test is explicitly LOGIC-ONLY (pure resolvePull/resolveSwipe + the rAF-coalescing #9141 contract) — both point at the real CDP-touch runners for the actual pointer pipeline. No more overstating. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…10722), live-e2e chrome path - #10713: the per-message COPY test asserted only the "Copied" affordance; now it reads navigator.clipboard.readText() back and asserts it equals the assistant text (the context already grants clipboard-read) — proving bytes reached the clipboard, not just that a label flipped. - #10722 WebKit: added an opt-in (PLAYWRIGHT_WEBKIT=1) WebKit/Safari-engine lane to the ui-smoke config, scoped to the keyless, permission-free pointer/focus/ text-input specs (chat-overlay-controls-interactions, conversation-management, slash-commands) so iOS/Safari pointer regressions are catchable; gated so a machine without the WebKit browser download never reds the default lane. - CI: the nightly app-real-e2e ubuntu job set ELIZA_LIVE_TEST=1 but never ELIZA_CHROME_PATH, so the live streaming suite self-skipped forever. Resolve the chromium the job already installs via playwright-core and export ELIZA_CHROME_PATH (with a test -x guard so a mismatch fails loudly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…over both dialects (#10722) The static view-capability audit was vacuous: `isReachable` passed if a view had any VIEW_ACTION_MAP entry (every audited view does → the assertion was unconditionally true), and the DOM-only regex was blind to the 8 spatial views (documents/inbox/goals/health/finances/relationships/todos/focus) that instrument via a spatial `agent=` prop → `data-agent-id`, not `useAgentElement`, so they passed as 0-control "cosmetic" for free (documents actually has 8 registrations the old grep counted as 0). Replaced it with a proportional DENSITY gate: a control-bearing view must register >= ceil(controls / 4) agent-addressable elements, counting controls + registrations across BOTH dialects (DOM handlers/buttons + spatial agent= props). Cap calibrated against the densest real view (orchestrator ~2.7 controls/reg) for ~1.5x headroom (no false fails) while still failing an under-instrumented view. Added a teeth/positive-control test (an 8-control/1-registration source FAILS — the exact case the old check let through) and honest describe/header labels stating it proves static registration density, not runtime hittability. Render- based coverage stays with the running-shell crawler (scripts/view-audit) — see the agent's rationale: @elizaos/agent must not import 14 leaf view packages (dependency inversion) and a bare jsdom render would false-fail. 20/20 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… deterministic BACKGROUND scenario (#10722, #10694) Completes the follow-up CI wiring so the new lanes actually run, and lands the deferred deterministic BACKGROUND scenario: - test.yml: install WebKit and run the opt-in `webkit` project over the keyless chat pointer/focus/composer specs (PLAYWRIGHT_WEBKIT=1) — without this step the WebKit lane never ran anywhere. - ui-e2e-gate.yml: gate the launcher real-drag reorder e2e (test:launcher-e2e), add the components/pages/** path trigger, and upload its output-launcher artifacts. - deterministic-background-actions.scenario.ts: the pr-deterministic lane coverage of the REAL plugin-app-control BACKGROUND handler — named-color + hex set, GLSL shader preset (text + explicit `preset`), a live-shader uniform tweak, undo, redo, reset — asserting the exact ordered `background:apply` broadcast ledger. Verified green locally (86ms). README updated. - background-set-color / background-shader-undo-redo (plugin-app-control, lane:"live-only"): NL→BACKGROUND routing variants for the live lane, matching the existing app-control live-scenario convention (excluded from PR CI; need the designated live model — gpt-oss-120b under-routes them, same as the sibling app-list live scenario). - run-chat-sheet-e2e: strengthen the #10698 no-fill gate to walk the WHOLE per-message wrapper chain (not just the immediate parent), so a fill re-added at any wrapper level is caught. Verified: 24 wrapper entries, all transparent. - Launcher.gestures.test comment: correct the runner filename (run-launcher-e2e.mjs section 2b, gated in ui-e2e-gate.yml). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8463e24 to
c32e433
Compare
There was a problem hiding this comment.
Your trial has ended. Reactivate Greptile to resume code reviews.
…ion (#11112 WebKit lane → 9/9) The 'transcript text is selectable' spec had a SECOND toHaveCSS('user-select', 'text') that #11103 missed when it fixed the first: WebKit's getComputedStyle reports only the prefixed -webkit-user-select and returns '' for the unprefixed property, so the assert failed on WebKit even though the app correctly emits BOTH (base.css select-text). Probe the prefixed property with an unprefixed fallback; the behavioral range-selection assert below is the real proof. Full WebKit pointer/focus lane now 9/9 (was 3/9), Chromium unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t slash-menu/reload (lane 3/9 → 9/9) (#11225) * fix(ui/chat): focusing the composer opens the overlay again — boot-race in expand()'s reveal gate (#11112) [MAJOR, live regression on develop, both engines] Focusing the chat composer textarea no longer flipped the overlay to data-open="true". Root cause (not a stale suppress-ref, not an element divergence — aria-label="message" and data-testid="chat-composer-textarea" are the same textarea): expand() early- returns when hasRevealableThread is false (visibleMessages empty && not loading). On /chat the overlay becomes focusable BEFORE the restored conversation's messages arrive, so a focus→expand() no-op'd — and focus is a one-shot event, so the sheet never opened even after the 34 messages loaded. Playwright trace confirmed: locator.focus fired while /api/conversations and .../messages were still in flight. The jsdom test passed because it renders the controller with messages already present, so the gate never tripped. Fix: park the open-intent (pendingExpandOnRevealRef) when there's nothing to reveal yet; a reveal-edge effect consumes it (one-shot) when the thread becomes showable — but only if the composer is STILL focused, so an abandoned focus can't pop the sheet open later. The suppressExpandOnFocusRef contract is untouched (a pill-open keyboard-raise consumes the suppress flag before expand runs, so it never parks an intent). Focusing a genuinely empty new chat still doesn't open an empty sheet. Reproduced on real Chromium (chat-overlay-controls-interactions 'long transcript scrolls': 16.5s timeout-fail → 2.5s pass, 4/4). +3 jsdom regression tests; overlay 126/126, fuzz 119/119; run-chat-sheet-e2e PASSED. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(ui/chat): surface slash-catalog fetch failures instead of swallowing them (#11112 diagnosis) The slash-command controller degraded a failed catalog / custom-actions fetch to [] with a silent .catch(() => []), making a fetch failure indistinguishable from a genuinely empty catalog — the menu just never mounts. That is exactly what made #11112's WebKit slash-menu failure hard to diagnose (the real cause: the service worker wasn't bypassed for Playwright routes on WebKit, so /api/* hit the real stub serving commands:[] — fixed in the ui-smoke config alongside the reload-persistence bug). Now both catches console.error a [useSlashCommandController]-prefixed message + the error before degrading; the composer still works catalog-less. filterCommandsForSurface authorization gating untouched. +5 controller unit tests (engine-agnostic: commands resolve whenever the catalog resolves incl. requiresAuth/requiresElevated under trusted defaults; unauthorized senders still lose gated commands; empty resolves silently; failed fetch degrades AND surfaces). 36/36. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(app): block the service worker on the WebKit ui-smoke lane (#11112 findings 1 & 2) The WebKit pointer/focus lane's slash-menu (finding 1) and conversation- reload-persistence (finding 2) failures share ONE root cause: the ui-smoke stack serves the PROD renderer, which registers /sw.js (skipWaiting + clients.claim). WebKit — unlike Chromium — does NOT bypass a controlling service worker when page.route interception is active, so once the SW claims the page every /api/* fetch goes AROUND the per-spec route fixtures to the real stub server (verified via an in-page probe: a route-fulfilled /api/conversations returned the stub server's conversations, not the fixture's). So slash listCommands resolved the stub's empty catalog (menu never mounted) and the reload rehydrated a foreign thread (timeout). Added serviceWorkers: 'block' to the webkit project — parity with the existing desktop-webkit lane that already documents this exact hazard. Config-only; both specs stay pristine. WebKit: slash 4/4 + conversation 3/3 (previously 0/4, 0/3); Chromium unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(app): probe -webkit-user-select on the sibling selectable assertion (#11112 WebKit lane → 9/9) The 'transcript text is selectable' spec had a SECOND toHaveCSS('user-select', 'text') that #11103 missed when it fixed the first: WebKit's getComputedStyle reports only the prefixed -webkit-user-select and returns '' for the unprefixed property, so the assert failed on WebKit even though the app correctly emits BOTH (base.css select-text). Probe the prefixed property with an unprefixed fallback; the behavioral range-selection assert below is the real proof. Full WebKit pointer/focus lane now 9/9 (was 3/9), Chromium unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: moon <stupidlybadadvice@gmail.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
|
❌ PR title does not match the required pattern. Please use one of these formats:
|
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
LifeOps Benchmark —
|
LifeOps Benchmark —
|
Summary
Follow-up hardening for the onboarding/chat/voice/gesture/background epic
(PR #11083, merged as
a9be4f48c70a). A 6-agent audit of that epic surfaceda set of residual gaps + test-honesty issues; this PR closes the
locally-achievable, verifiable ones. Branched fresh off
develop(9 commits, 0behind), so it layers cleanly on top of the merged epic and develop's
subsequent shader refinement (#11088/#11102).
What's in it
Voice / #10700 — converse-mode fuzz dimension. The shell send/voice/new-chat
fuzz declared converse (VAD/semantic end-of-turn commit→send) a deferred
dimension. Added it: the fuzz now drives converse capture through the real
TurnAggregator(a complete final commits synchronously → VOICE_DM send) andasserts
lastTurnVoiceclears on every new-chat; a dedicated test proves acomplete converse final sends a VOICE_DM (not a plain DM) + a negative (pure
disfluency commits but the respond-gate drops it). Also corrected the header's
mock disclosure (sendChatText is the separately-pinned send-queue leaf).
Background / #10694 residuals.
persisted" — redo was in-memory only).
(
state/background-history.ts,applyBackgroundSet/Undo/Redo) used by BOTHuseDisplayPreferencesand the e2e fixture — the fixture no longer hand-mirrorsthe history semantics, so drift is impossible. Added a direct reducer unit test.
Voice-test honesty / #10726. Retired the tautological WER assertion in the
Chromium voice lanes (the mock ASR echoes the expected phrase → WER structurally
0, can never regress). The load-bearing "a real WAV reached ASR" assert stays;
WER accuracy is scored only in the real-recognizer tiers.
Chat UI regression gates.
fill (the backdrop-blur gate bans blur, not a fill) — 12 bubbles asserted
transparent.
(was jsdom-synthetic only). Also fixed a real bug:
run-home-screen-e2ecalledan orphaned
swipeRight()helper (onlyswipeLeftsurvived a develop rebase),crashing the whole home CI lane with a ReferenceError.
navigator.clipboard.readText()back and asserts the bytes, not just the "Copied" label.
Interaction de-larp / #10722.
Reorder.Itempointerdrag that fires reorder telemetry, changes the tile order, persists to
LAUNCHER_STORAGE_KEY, and drops/duplicates no ids (verified live: 0→23reorder events, 25 unique ids). The mock-based
Launcher.gestures.testanduse-pull-gesture.testare relabelled as explicitly logic-only (they nolonger overstate gesture-pipeline coverage).
VIEW_ACTION_MAPentry — every view has one) and blind to the 8 spatial views(which instrument via
agent=→data-agent-id, notuseAgentElement, so theypassed as 0-control for free —
documentsactually has 8 registrations countedas 0). Replaced with a proportional density gate over both DOM + spatial
dialects (
>= ceil(controls / 4)registrations), calibrated with headroom overthe densest real view, plus a teeth/positive-control test that FAILS an
8-control/1-registration source. Render-based coverage stays with the running-
shell crawler (
@elizaos/agentmust not import 14 leaf view packages).PLAYWRIGHT_WEBKIT=1) Safari-engine project in theui-smoke config, scoped to the permission-free pointer/focus/text-input specs.
CI.
app-real-e2e.ymlnow exportsELIZA_CHROME_PATH(resolved from thechromium it already installs) so the nightly live-streaming lane stops
self-skipping forever.
Evidence (verified against this develop base)
packages/uitouched suites 38/38 (background-history reducer,redo-persist round-trip, converse fuzz) +
packages/agentview-capability20/20. ui + app typecheck clean (one pre-existing develop-wide
plugin-local-inferencedist-staleness error, untouched here).(
run-launcher-e2e— real Framer drag → telemetry + persistence + no dup ids),home-screen pull-down (
run-home-screen-e2e), chat-sheet no-fill(
run-chat-sheet-e2e). Committed screenshots + walkthrough webms.Honest deferrals (N/A with reason — not larped)
ELIZA_AUDIT_APP_STRICTflip — a separateredesign epic needing the
audit:app5-loop on a fully-built app. The "cardchrome" in AutomationsFeed/CameraPageView/Launcher is functional component
shape (launcher tiles, camera shutter, badges), not gratuitous card wrappers,
so a blind strip would regress the design.
actions (need a completed device run / a clean full-audit run to seed).
locally (pre-existing
@elizaos/plugin-discord/user-account-scraper+plugin-local-inferencedist staleness; CI builds first). The BACKGROUND NL→plan→payload path is covered by deterministic tests over the real
inferBackgroundPlan.🤖 Generated with Claude Code