Skip to content

Performance Diagnosis Handoff: Panorama/Profile lag root causes and validated boundaries #609

@wilhel1812

Description

@wilhel1812

Performance diagnosis handoff (panorama/profile lag)

This issue captures the completed diagnosis work so implementation can continue without re-running all profiling from scratch.

Current context

  • Staging includes recent performance instrumentation and mitigation work.
  • Workerized coverage.compute path has been added with fallback.
  • Pan interaction guards + overscan/fast-full staging were implemented; a pan-trigger regression was found and fixed.

What was diagnosed

1) Lag is not primarily in panorama input handlers

From perf debug logs (debug-logs/Console.txt, panning*.txt, sliders*.txt):

  • panorama.handler.* and mapview.panoramaInteraction.* are typically very small (mostly sub-ms/low-ms).
  • panorama.effect.interactionDispatch is also small relative to total stalls.

Conclusion: pointer/hover handlers are not the core bottleneck.

2) Main cost is simulation + overlay pipeline

Repeatedly observed high costs in:

  • coverage.compute (simulation grid compute)
  • overlay.coverage.build.* / overlay.coverage.total.* (raster build/encode)

Examples seen across runs:

  • panning cases had coverageComputeMs spikes into multi-second range (e.g. ~4-5s in worst captures).
  • overlay build totals for passfail/relay can also be very large in some captures.

Conclusion: expensive compute/raster work dominates perceived lag.

3) Some captures were polluted by startup/background triggers

Even after attempting settle-first tests, many traces still include non-interaction triggers:

  • coverage.trigger.preset
  • coverage.trigger.terrain
  • coverage.trigger.selection
  • coverage.trigger.library-sync

Conclusion: not all collected traces were pure profile-only sessions; some include ongoing background/global recomputes.

4) Trigger attribution quality is now good

  • coverage.trigger.unknown was reduced to 0 in clean-tag captures.
  • This means trigger provenance is now actionable.

5) Regression that was introduced and fixed

A pan-safezone implementation briefly caused recompute storms:

  • false isMapInteracting due to non-user move events
  • excessive pan-fast triggers
  • stuck sidebar state (Preparing simulation bounds...) when run skipped as unchanged

Fixes applied:

  • only user-driven moves set map-interacting
  • throttled pan-fast triggering
  • pan-settle only after real interaction
  • coverage store clears pending UI state on skip-same-signature path

Where problems are

  • Global simulation path (coverage.compute) still contributes significant latency under interaction.
  • Coverage overlay raster builds (overlay.coverage.build/total) still heavy in passfail/relay views.
  • Profile smoothness is degraded when global recompute work overlaps profile interactions.

Where problems are not

  • Not primarily in profile pointermove/hover handlers.
  • Not primarily in panorama event dispatch plumbing.

Implemented groundwork already in place

  • Perf telemetry buckets for triggers/stages/drops.
  • Overscan/safezone and fast/full stage infrastructure in map overlay path.
  • Worker infrastructure for coverage compute (coverageWorker + client integration + fallback).

Recommended next implementation steps (in order)

  1. Enforce strict profile-only interaction boundary:

    • while profile dragging/panning is active, suppress/defer non-essential global coverage recomputes.
    • settle-trigger authoritative recompute once interaction stops.
  2. Validate worker path is active in staging under real user flow:

    • verify coverage.compute no longer blocks main thread responsiveness.
    • add explicit telemetry flag/counter for worker vs fallback execution.
  3. Reduce overlay build pressure in interaction mode:

    • maintain fast-stage rasterization during active interaction.
    • ensure passfail/relay heavy paths are not recomputed more often than needed.
  4. Re-profile with strict protocol:

    • wait fully idle
    • clear log
    • perform profile-only interaction
    • avoid map/selection/env changes during capture

Acceptance target for this issue

  • Profile panning/slider interactions feel smooth after initial load settles.
  • No repeated storm-style recompute triggering during idle/non-map interaction.
  • Telemetry clearly shows reduced overlap between profile interaction and heavy global recompute work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions