Close the public scan funnel friction (analytics + UX + server telemetry)#4
Conversation
trackAnonymousScanError was only firing after retries exhausted in the catch path. The 4xx and 502 early-return branches exited silently, so PostHog never saw events for rate_limited, clone_failed, invalid_url, or repo_too_large. R9 weekly digest 2026-05-04 surfaced this as anonymous_scan_started=15 / completed=5 / error=3 — 7 unaccounted-for scans. Now fires error events on all non-success exits so the funnel reconciles: started == completed + error[*]. error_code property distinguishes the failure mode for downstream analysis. No changes to user-visible behavior. Pure observability fix.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Yesterday's funnel showed 9 anonymous_scan_started events from one user
with only 1 anonymous_scan_completed. The most likely silhouette is a
user retrying after closing a tab mid-scan or hitting a flake they
couldn't see. These changes harden the UX so future drop-offs surface
visibly instead of silently.
Changes:
- localStorage scan recovery: scan state persists across tab close /
page refresh. Returning users see a "Resume / Start fresh" card
instead of starting a new scan blindly. TTL 5 min.
- Duplicate-prevention guard with explicit force-bypass for legitimate
re-scan entry points (Resume button, ?url= deeplink). Plain Scan
button click respects the guard.
- Specific error messages per error code (clone_failed, invalid_url,
rate_limited, scan_failed, repo_too_large, timeout, network_error).
Generic "Scan failed" is gone.
- Cancel button on the progress UI with a cancelledRef that exits the
retry loop even during the 3s/6s backoff sleeps.
- Elapsed-time display + 60s reassurance message ("Still working — large
repos can take up to 3 minutes").
- Cached-response badge ("Already scanned recently — opening cached
report") when the server returns cached: true.
- Success redirect now cancellable via successRedirectRef so a late
Cancel click during the 600/1200ms hand-off doesn't ship the user to
a stale report.
- Auto-scan effect guarded against !isLoaded || isSignedIn so signed-in
users (about to redirect to /dashboard/scan) don't double-fire a
public scan first.
- Client timeout reduced 200s → 180s (cleaner alignment with the
3-minute "still working" message).
Builds on 8e97586 (analytics tracking for 4xx + 502 paths). With both
landed, the funnel reconciles: started == completed + error[*] for
every flow the user can take.
Doesn't change: the API contract, scan-public-types, server-side
behaviour. Pure client UX hardening on top of the prior observability fix.
|
Update 2026-05-12: pushed a follow-on commit (c357c25) that extends the original observability fix with the full UX hardening pass. The original commit (8e97586) closed the analytics-tracking gap so we could see failures. This new commit closes the user-visible silent-failure paths so they stop happening:
Trigger for the work: 2026-05-11 funnel showed 1 user / 9 starts / 1 complete. The signature looks like a user hitting a flake and retrying blindly — exactly the failure mode this PR fixes. Detailed test plan in the PR description. Recommend running Not in scope (separate PR coming): server-side scan telemetry (posthog-node in the API route) — durable fix for the data-quality gap that lets these incidents go silent in the first place. |
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/app/scan/page.tsx">
<violation number="1" location="src/app/scan/page.tsx:379">
P2: The pending-scan recovery panel is hidden whenever `error` is set, but the duplicate-scan path sets an error telling users to use Resume/Start fresh. This makes those actions inaccessible in that state.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Six issues surfaced by the remote multi-agent review (4 normal, 2 nit): bug_001 — Recovery banner hidden when duplicate-scan error was set Banner was gated on `pendingScan && !error`, but the duplicate-detection branch set both. Result: red text instructing the user to click buttons that weren't rendered. Fix: don't set an error in the duplicate branch (banner alone communicates state); banner gate now allows any non- too_large error so the recovery buttons still render alongside real errors. bug_006 — User cancel was a new silent funnel gap Cancel button aborted the fetch and short-circuited the catch, so the PostHog funnel saw a `scan_started` with no matching terminal event — identical to a tab-close. Fix: fire `trackAnonymousScanError` with `error_code: "user_cancelled"` from handleCancel. Adds `startTimeRef` so handleCancel can report duration_ms. bug_010 — 502 + scan_failed misrepresented as "too large" Backend returns 502 with two codes: `repo_too_large` (upstream timeout) or `scan_failed` (worker crash, deploy roll). The previous handler unconditionally rendered the "too_large" UI for both, steering users toward CLI signup on transient failures. Fix: branch on `err.code`. merged_bug_002 — Cancel state machine had three unhandled edges 1. Success-path race: `abortRef` was nulled before `await res.json()` resolved, so a cancel during that yield still ran through to `router.push`. Fix: check `cancelledRef` and the new generation token before entering the success branch. 2. `successRedirectRef` timer wasn't cleared on unmount, so navigating away during the 600–1200ms hand-off could yank the user back. Fix: useEffect cleanup. 3. Cross-scan race: a stale retry loop from scan A could resume after user started scan B (because handleScan resets cancelledRef to false). Fix: `scanGenRef` monotonic counter captured locally per scan; stale loops see myGen !== scanGenRef.current and abort silently. bug_011 (nit) — Cached subtitle contradicted header "Already scanned recently — opening cached report" header paired with "Core analysis in progress" subtitle for the 600ms redirect window. Fix: subtitle now branches on `showCached` first. bug_012 (nit) — "Resume" label promised what the code didn't deliver The button fired a brand-new scan via `force: true`; no scan-id, no poll, no actual attach to the in-flight request. Fix: rename to "Try again" so the label matches behaviour. True scan-id resume is a separate backend change (out of scope for this PR). All fixes verified: tsc --noEmit passes, dev server bundle contains the new symbols (user_cancelled, scanGenRef, startTimeRef, Try again, Loading your previous report).
Why this exists: the browser-side analytics in analytics-public.ts only fires when the React component is still alive. Tab close, network drop, or any silent disconnect mid-scan means we lose the terminal event. The 2026-05-11 incident (1 user, 9 starts, 1 complete) is the exact shape of this gap — even with the prior client-side fixes (8e97586, c357c25, 93a8c91), a tab close at the wrong moment still leaves a started event with no matching complete/error. Server-side events from the /api/scan-public route give us ground truth: every scan that begins on the server fires either server_scan_completed or server_scan_failed regardless of what happens client-side. New events: - server_scan_started — fired right before executeScanPipeline - server_scan_completed — fired after a successful pipeline run - server_scan_failed — fired on every non-success exit (invalid_url, rate_limited, pipeline error 4xx/502, internal 500) distinct_id = `ip:<client_ip>` when available, else "server-anon". Same-IP browser sessions stitch together client + server events in PostHog automatically. No PII beyond what the existing console.log already records. flushAnalytics() is awaited before every response to ensure events reach PostHog before the Vercel serverless function terminates. Best-effort try/catch on every capture so analytics failures never surface into the request path. No change to the public API contract or the existing client-side analytics — these are net-new events, complementary to anonymous_* events from the browser. Useful funnel queries after this lands: -- Reconcile client + server start counts (should match closely) SELECT count(*) FILTER (WHERE event = 'anonymous_scan_started') AS client_starts, count(*) FILTER (WHERE event = 'server_scan_started') AS server_starts FROM events WHERE timestamp >= now() - INTERVAL 1 DAY; -- Find scans the server completed but the client never saw SELECT s.distinct_id, s.timestamp, s.properties.repo_url FROM events s WHERE s.event = 'server_scan_completed' AND NOT EXISTS ( SELECT 1 FROM events c WHERE c.event = 'anonymous_scan_completed' AND c.distinct_id = s.distinct_id AND abs(extract(epoch from c.timestamp - s.timestamp)) < 60 ); Sentry note: already configured (sentry.server.config.ts present, withSentryConfig wired in next.config.mjs). Just needs the NEXT_PUBLIC_SENTRY_DSN env var to be set in Vercel to enable. Adds posthog-node@5.33.7 to dependencies.
Summary
Closes the silent-failure gap in the public scan flow, identified by the 2026-05-11 incident (1 user, 9 anonymous_scan_started events, 1 completed). Three layers of fix, all on this branch:
Layer 1 — analytics tracking (8e97586, May 5)
trackAnonymousScanError was only firing in the catch path. 4xx + 502 early-returns exited silently, so rate_limited / clone_failed / invalid_url / repo_too_large were invisible in PostHog. Now fires on every non-success exit. Funnel reconciles: started == completed + error[*].
Layer 2 — client-side UX hardening (c357c25)
Layer 3 — /ultrareview fixes (93a8c91)
Six issues surfaced by the cloud multi-agent review:
Layer 4 — server-side telemetry (9911076)
The durable fix. New src/lib/analytics-server.ts wires posthog-node into /api/scan-public so we have ground truth even when the browser disconnects:
Adds posthog-node@5.33.7.
What this does NOT change
Test plan