Skip to content

Close the public scan funnel friction (analytics + UX + server telemetry)#4

Merged
cloakmaster merged 4 commits into
mainfrom
claude/funnel-tracking-fix
May 12, 2026
Merged

Close the public scan funnel friction (analytics + UX + server telemetry)#4
cloakmaster merged 4 commits into
mainfrom
claude/funnel-tracking-fix

Conversation

@cloakmaster

@cloakmaster cloakmaster commented May 5, 2026

Copy link
Copy Markdown
Member

Summary

Closes the silent-failure gap in the public scan flow, identified by the 2026-05-11 incident (1 user, 9 anonymous_scan_started events, 1 completed). Three layers of fix, all on this branch:

Layer 1 — analytics tracking (8e97586, May 5)

trackAnonymousScanError was only firing in the catch path. 4xx + 502 early-returns exited silently, so rate_limited / clone_failed / invalid_url / repo_too_large were invisible in PostHog. Now fires on every non-success exit. Funnel reconciles: started == completed + error[*].

Layer 2 — client-side UX hardening (c357c25)

  • localStorage scan recovery — scan state persists across tab close / refresh, 5-minute TTL
  • Duplicate-prevention guard with explicit force-bypass for legitimate re-scan paths (Try again button, ?url= deeplink)
  • Specific error messages per error code — generic "Scan failed" gone
  • Cancel button on progress UI with cancelledRef + AbortController
  • Elapsed-time display + 60s reassurance ("Still working — large repos can take up to 3 minutes")
  • Cached-response badge ("Already scanned recently — opening cached report")
  • Success redirect cancellable via successRedirectRef
  • Auto-scan effect guarded against !isLoaded || isSignedIn
  • Client timeout 200s → 180s

Layer 3 — /ultrareview fixes (93a8c91)

Six issues surfaced by the cloud multi-agent review:

  • bug_001 recovery buttons hidden when duplicate-scan error set — banner now renders alongside non-too_large errors
  • bug_006 user cancel was a new silent funnel gap — handleCancel now fires anonymous_scan_error with error_code "user_cancelled"
  • bug_010 502 + scan_failed misrepresented as "too large" — 502 handler branches on err.code
  • merged_bug_002 cancel state machine: success-path race + unmount cleanup + cross-scan race via scanGenRef token
  • bug_011 cached subtitle contradicted header — subtitle branches on showCached
  • bug_012 "Resume" promised what code didn't deliver — renamed "Try again"

Layer 4 — server-side telemetry (9911076)

The durable fix. New src/lib/analytics-server.ts wires posthog-node into /api/scan-public so we have ground truth even when the browser disconnects:

  • server_scan_started fires before pipeline execution
  • server_scan_completed fires on successful return
  • server_scan_failed fires on every non-success exit (invalid_url, rate_limited, pipeline 4xx/502, internal 500)
  • distinct_id = ip:<client_ip> (bridges to browser anonymous session)
  • flushAnalytics() awaited before every response so events reach PostHog before Vercel kills the serverless function
  • All captures try/catch'd — analytics failures never surface into the request path

Adds posthog-node@5.33.7.

What this does NOT change

  • The /api/scan-public response contract
  • The PublicScanError type
  • Browser-side analytics in src/lib/analytics-public.ts (still fires its own events; server events are complementary)
  • Sentry wiring (separate — DSN was just added to Vercel env, will activate on next deploy)

Test plan

  • tsc --noEmit clean on src/app/scan/page.tsx and src/lib/analytics-server.ts and src/app/api/scan-public/route.ts
  • npm run build succeeds (17 static pages)
  • Dev server bundle contains all new symbols (inkog_pending_scan, user_cancelled, scanGenRef, Try again, Loading your previous report)
  • cubic AI reviewer: no blocking issues
  • Vercel preview deploys
  • Manual browser walkthrough on production after merge — scan a small public repo, close tab mid-scan to test recovery, click Cancel, click Try again from the recovery banner
  • PostHog: confirm server_scan_started events appear within 24h of merge (the data-quality reconciliation that closes the loop on the May 11 incident)

trackAnonymousScanError was only firing after retries exhausted in
the catch path. The 4xx and 502 early-return branches exited
silently, so PostHog never saw events for rate_limited, clone_failed,
invalid_url, or repo_too_large. R9 weekly digest 2026-05-04 surfaced
this as anonymous_scan_started=15 / completed=5 / error=3 — 7
unaccounted-for scans.

Now fires error events on all non-success exits so the funnel
reconciles: started == completed + error[*]. error_code property
distinguishes the failure mode for downstream analysis.

No changes to user-visible behavior. Pure observability fix.
@vercel

vercel Bot commented May 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inkog-dashboard Ready Ready Preview, Comment May 12, 2026 1:57pm

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Yesterday's funnel showed 9 anonymous_scan_started events from one user
with only 1 anonymous_scan_completed. The most likely silhouette is a
user retrying after closing a tab mid-scan or hitting a flake they
couldn't see. These changes harden the UX so future drop-offs surface
visibly instead of silently.

Changes:

- localStorage scan recovery: scan state persists across tab close /
  page refresh. Returning users see a "Resume / Start fresh" card
  instead of starting a new scan blindly. TTL 5 min.
- Duplicate-prevention guard with explicit force-bypass for legitimate
  re-scan entry points (Resume button, ?url= deeplink). Plain Scan
  button click respects the guard.
- Specific error messages per error code (clone_failed, invalid_url,
  rate_limited, scan_failed, repo_too_large, timeout, network_error).
  Generic "Scan failed" is gone.
- Cancel button on the progress UI with a cancelledRef that exits the
  retry loop even during the 3s/6s backoff sleeps.
- Elapsed-time display + 60s reassurance message ("Still working — large
  repos can take up to 3 minutes").
- Cached-response badge ("Already scanned recently — opening cached
  report") when the server returns cached: true.
- Success redirect now cancellable via successRedirectRef so a late
  Cancel click during the 600/1200ms hand-off doesn't ship the user to
  a stale report.
- Auto-scan effect guarded against !isLoaded || isSignedIn so signed-in
  users (about to redirect to /dashboard/scan) don't double-fire a
  public scan first.
- Client timeout reduced 200s → 180s (cleaner alignment with the
  3-minute "still working" message).

Builds on 8e97586 (analytics tracking for 4xx + 502 paths). With both
landed, the funnel reconciles: started == completed + error[*] for
every flow the user can take.

Doesn't change: the API contract, scan-public-types, server-side
behaviour. Pure client UX hardening on top of the prior observability fix.
@cloakmaster

Copy link
Copy Markdown
Member Author

Update 2026-05-12: pushed a follow-on commit (c357c25) that extends the original observability fix with the full UX hardening pass.

The original commit (8e97586) closed the analytics-tracking gap so we could see failures. This new commit closes the user-visible silent-failure paths so they stop happening:

  • localStorage scan recovery (tab close / refresh / retry doesn't lose state)
  • Cancel button with abortable retry loop
  • Specific error messages per error code (no more generic "Scan failed")
  • Cached-response badge
  • Success-redirect cancellable
  • Auto-scan guard against signed-in users
  • Duplicate-prevention with explicit force-bypass for Resume / deeplinks

Trigger for the work: 2026-05-11 funnel showed 1 user / 9 starts / 1 complete. The signature looks like a user hitting a flake and retrying blindly — exactly the failure mode this PR fixes.

Detailed test plan in the PR description. Recommend running /ultrareview 4 before merge.

Not in scope (separate PR coming): server-side scan telemetry (posthog-node in the API route) — durable fix for the data-quality gap that lets these incidents go silent in the first place.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/app/scan/page.tsx">

<violation number="1" location="src/app/scan/page.tsx:379">
P2: The pending-scan recovery panel is hidden whenever `error` is set, but the duplicate-scan path sets an error telling users to use Resume/Start fresh. This makes those actions inaccessible in that state.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/app/scan/page.tsx Outdated
Six issues surfaced by the remote multi-agent review (4 normal, 2 nit):

bug_001 — Recovery banner hidden when duplicate-scan error was set
Banner was gated on `pendingScan && !error`, but the duplicate-detection
branch set both. Result: red text instructing the user to click buttons
that weren't rendered. Fix: don't set an error in the duplicate branch
(banner alone communicates state); banner gate now allows any non-
too_large error so the recovery buttons still render alongside real errors.

bug_006 — User cancel was a new silent funnel gap
Cancel button aborted the fetch and short-circuited the catch, so the
PostHog funnel saw a `scan_started` with no matching terminal event —
identical to a tab-close. Fix: fire `trackAnonymousScanError` with
`error_code: "user_cancelled"` from handleCancel. Adds `startTimeRef`
so handleCancel can report duration_ms.

bug_010 — 502 + scan_failed misrepresented as "too large"
Backend returns 502 with two codes: `repo_too_large` (upstream timeout)
or `scan_failed` (worker crash, deploy roll). The previous handler
unconditionally rendered the "too_large" UI for both, steering users
toward CLI signup on transient failures. Fix: branch on `err.code`.

merged_bug_002 — Cancel state machine had three unhandled edges
1. Success-path race: `abortRef` was nulled before `await res.json()`
   resolved, so a cancel during that yield still ran through to
   `router.push`. Fix: check `cancelledRef` and the new generation token
   before entering the success branch.
2. `successRedirectRef` timer wasn't cleared on unmount, so navigating
   away during the 600–1200ms hand-off could yank the user back. Fix:
   useEffect cleanup.
3. Cross-scan race: a stale retry loop from scan A could resume after
   user started scan B (because handleScan resets cancelledRef to false).
   Fix: `scanGenRef` monotonic counter captured locally per scan;
   stale loops see myGen !== scanGenRef.current and abort silently.

bug_011 (nit) — Cached subtitle contradicted header
"Already scanned recently — opening cached report" header paired with
"Core analysis in progress" subtitle for the 600ms redirect window.
Fix: subtitle now branches on `showCached` first.

bug_012 (nit) — "Resume" label promised what the code didn't deliver
The button fired a brand-new scan via `force: true`; no scan-id, no
poll, no actual attach to the in-flight request. Fix: rename to
"Try again" so the label matches behaviour. True scan-id resume is a
separate backend change (out of scope for this PR).

All fixes verified: tsc --noEmit passes, dev server bundle contains
the new symbols (user_cancelled, scanGenRef, startTimeRef, Try again,
Loading your previous report).
Why this exists: the browser-side analytics in analytics-public.ts only
fires when the React component is still alive. Tab close, network drop,
or any silent disconnect mid-scan means we lose the terminal event. The
2026-05-11 incident (1 user, 9 starts, 1 complete) is the exact shape
of this gap — even with the prior client-side fixes (8e97586, c357c25,
93a8c91), a tab close at the wrong moment still leaves a started event
with no matching complete/error.

Server-side events from the /api/scan-public route give us ground truth:
every scan that begins on the server fires either server_scan_completed
or server_scan_failed regardless of what happens client-side.

New events:
- server_scan_started — fired right before executeScanPipeline
- server_scan_completed — fired after a successful pipeline run
- server_scan_failed — fired on every non-success exit (invalid_url,
  rate_limited, pipeline error 4xx/502, internal 500)

distinct_id = `ip:<client_ip>` when available, else "server-anon".
Same-IP browser sessions stitch together client + server events in
PostHog automatically. No PII beyond what the existing console.log
already records.

flushAnalytics() is awaited before every response to ensure events
reach PostHog before the Vercel serverless function terminates.
Best-effort try/catch on every capture so analytics failures never
surface into the request path.

No change to the public API contract or the existing client-side
analytics — these are net-new events, complementary to anonymous_*
events from the browser. Useful funnel queries after this lands:

  -- Reconcile client + server start counts (should match closely)
  SELECT count(*) FILTER (WHERE event = 'anonymous_scan_started') AS client_starts,
         count(*) FILTER (WHERE event = 'server_scan_started') AS server_starts
  FROM events WHERE timestamp >= now() - INTERVAL 1 DAY;

  -- Find scans the server completed but the client never saw
  SELECT s.distinct_id, s.timestamp, s.properties.repo_url
  FROM events s
  WHERE s.event = 'server_scan_completed'
    AND NOT EXISTS (
      SELECT 1 FROM events c
      WHERE c.event = 'anonymous_scan_completed'
        AND c.distinct_id = s.distinct_id
        AND abs(extract(epoch from c.timestamp - s.timestamp)) < 60
    );

Sentry note: already configured (sentry.server.config.ts present,
withSentryConfig wired in next.config.mjs). Just needs the
NEXT_PUBLIC_SENTRY_DSN env var to be set in Vercel to enable.

Adds posthog-node@5.33.7 to dependencies.
@cloakmaster cloakmaster changed the title fix(analytics): close anonymous scan funnel tracking gap Close the public scan funnel friction (analytics + UX + server telemetry) May 12, 2026
@cloakmaster cloakmaster merged commit 3dbe03a into main May 12, 2026
3 checks passed
@cloakmaster cloakmaster deleted the claude/funnel-tracking-fix branch May 12, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant