Skip to content

Conversation

@trieloff
Copy link
Contributor

@trieloff trieloff commented Dec 10, 2025

Summary

Offload RUM analysis to a Web Worker with progressive phases and streaming ingestion, delivering converging estimates in real-time while data loads.

Key Features

  • Progressive Analysis: Configurable phase thresholds (e.g., 12% → 25% → 50% → 100%) with O(Δ) incremental processing
  • Streaming Ingestion: Out-of-order slice arrival with coverage tracking; auto-progress keeps compute ≤50% behind ingestion
  • Converging Estimates: Scaled totals and approximate quantiles during loading; exact values at completion
  • Simplified Facet API: Unified { value, count, weight } shape—scaled estimates when incomplete, exact at 100%

User-Facing Changes

New API: StreamingDataChunks

import { createStreamingDataChunks } from '@adobe/rum-distiller/worker/streaming';

const dc = createStreamingDataChunks(
  new URL('@adobe/rum-distiller/worker', import.meta.url)
);

// Configure
dc.addDistillerSeries('pageViews');
dc.addDistillerSeries('lcp');
dc.addDistillerFacet('plainURL');
dc.setThresholds(20);           // 20 phases from 12% to 100%
dc.prepareQuantiles(0.5, 0.9, 0.99);
dc.defaultTopK = 50;

// Stream results
dc.expectChunks = 60;
dc.onSnap((snap) => {
  // snap.progress: 0..1 (phase × coverage)
  // snap.totals, snap.quantiles, snap.facets
  render(snap);
});
dc.onDone((snap) => console.log('Complete'));

// Ingest slices as they arrive
dc.load(slice1);
dc.load(slice2);
dc.load();  // finalize

Snapshot Shape

Field Description
phase Current sampling phase (0..1]
progress Overall progress = phase × coverage
totals[series] { count, sum, min, max, mean } — scaled when incomplete
sampleTotals Raw (unscaled) partials when progress < 1
quantiles[series] P² approximations early; exact at completion
facets[name] Top-K { value, count, weight } — scaled when incomplete
ingestion { received, expected, coverage }

Custom Facets/Series (ESM Modules)

dc.addModuleFacet('myFacet', new URL('./my-facets.js', import.meta.url));
dc.addModuleSeries('mySeries', new URL('./my-series.js', import.meta.url));

Architecture

┌─────────────────┐     postMessage      ┌──────────────────────┐
│   Main Thread   │ ◄──────────────────► │   analysis.worker.js │
│                 │                      │                      │
│ createStreaming │                      │  ┌────────────────┐  │
│ DataChunks()    │                      │  │ StreamingRun   │  │
│                 │                      │  │ ProgressiveRun │  │
└─────────────────┘                      └──────────────────────┘

Engine Classes

Class File Purpose
ProgressiveRun worker/progressive.js Pre-filters once, sorts by membership hash, O(Δ) per advanceTo()
StreamingRun worker/streaming.js Bins bundles by membership, processes incrementally as phases advance
P2Quantile src/quantiles/p2.js Online P² quantile estimator (5 markers, constant memory)
SpaceSaving src/topk/space_saving.js Streaming heavy-hitters for Top-K

Estimation Model

  • Progressive: completeness f = phase
  • Streaming: completeness f = phase × coverage (coverage = received/expected)
  • Scaling: totals.count and totals.sum scaled by 1/f when incomplete
  • Quantiles: P² approximation until f = 1, then exact sort-based calculation

Files Changed

File Change
worker/streaming.js New StreamingRun class + createStreamingDataChunks() wrapper
worker/progressive.js New ProgressiveRun class for O(Δ) phase advancement
worker/engine.js Shared utilities: membership(), sampleChunksAt(), exactQuantilesFromValues()
worker/analysis.worker.js Worker message handler with stream:* commands
src/hash.js FNV-1a 32-bit hash for stable sampling
src/quantiles/p2.js P² online quantile estimator
src/topk/space_saving.js Space-Saving heavy-hitters algorithm
README.md Updated with StreamingDataChunks API and examples
package.json New exports, browser test deps, npm run demo script
test/browser/demo.html Interactive demo with phase/coverage progress bars
test/browser/worker.progressive.test.js Browser integration tests
test/engine.progressive.test.js Node unit tests for engine

Testing

# Node tests
npm test

# Browser tests (Puppeteer/Chromium)
npm run test:browser

# Interactive demo
npm run demo
# → Opens http://localhost:5169/test/browser/demo.html

Demo Features

  • Dual progress bars: Phase (compute) and Ingestion (I/O)
  • Jittered ingestion (30–100ms) to simulate network
  • Phase speed selector: Fast (4) / Medium (20) / Slow (100)
  • Filter dropdown with automatic threshold restart
  • Custom facet/series loaded from demo-mods.js

How to Review

  1. Run the demo: npm run demo and observe phase/coverage interaction
  2. Verify convergence: Watch numbers stabilize as progress approaches 100%
  3. Check quantiles: Approximate early, exact at completion
  4. Test filters: Change filters and verify threshold restart + slice replay

Related

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces incremental progressive analysis for RUM data processing via a Web Worker implementation with streaming results at configurable sampling phases (e.g., 12% → 25% → 50% → 100%). The system uses stable hashing for consistent sampling tiers, online quantile estimation (P²), and space-saving algorithms for Top-K facets. Early phases provide approximate analytics that converge to exact results at 100%, enabling responsive UI updates during large dataset analysis.

Key changes:

  • Worker-based architecture: New worker/ directory with session client, worker entry point, and progressive/engine implementations
  • Streaming algorithms: P² quantile estimator and Space-Saving Top-K tracker in src/quantiles/ and src/topk/
  • Enhanced demo and testing: Browser-based demo with progress tracking, Node and browser test suites using web-test-runner

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
worker/session.js Browser client for communicating with analysis worker via message passing
worker/analysis.worker.js Web Worker entry point handling init, load, compute, and cancel commands
worker/progressive.js Incremental progressive analysis with O(delta) per-phase complexity
worker/engine.js Progressive analysis engine for sampling, aggregation, and facet computation
src/hash.js FNV-1a hash implementation for stable bundle sampling
src/quantiles/p2.js P² online quantile estimator for approximate percentiles
src/topk/space_saving.js Space-Saving algorithm for weighted Top-K heavy hitters
tools/serve-demo.mjs Static HTTP server for demo with path traversal protection
tools/open-demo.mjs CLI tool to open demo in browser
web-test-runner.config.mjs Browser test configuration using Puppeteer
test/engine.progressive.test.js Node unit tests for progressive engine
test-browser/worker.progressive.test.js Browser integration tests for worker streaming
test-browser/demo.html Interactive demo with Zipf-distributed synthetic data
package.json New dependencies (@web/test-runner, puppeteer) and demo/test scripts
README.md Documentation for progressive worker API and totals vs estimates
Comments suppressed due to low confidence (2)

worker/analysis.worker.js:24

  • Unused function yieldToEventLoop.
function yieldToEventLoop() { return new Promise((r) => setTimeout(r, 0)); }

worker/progressive.js:29

      this.items[i] = { b, h: membership(b, keyForBundle(b)) };

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@trieloff trieloff changed the title feat(worker): incremental progressive streaming + converging estimates; demo and tests feat(worker): progressive + streaming ingestion; converging estimates; simplified facets; demo and docs Dec 11, 2025
@trieloff trieloff force-pushed the feat/progressive-incremental-worker branch from 9c825e7 to de9cecb Compare December 11, 2025 13:02
…ngDataChunks); demo; docs; exports

Signed-off-by: Lars Trieloff <[email protected]>
@trieloff trieloff force-pushed the feat/progressive-incremental-worker branch from de9cecb to 372d86f Compare December 11, 2025 13:03
…error handling; keep API compatible

Signed-off-by: Lars Trieloff <[email protected]>
…on + client shim; update demo/tests/docs accordingly

Signed-off-by: Lars Trieloff <[email protected]>
…repo config unchanged; limit lint scope to PR files

Signed-off-by: Lars Trieloff <[email protected]>
@trieloff trieloff changed the title feat(worker): progressive + streaming ingestion; converging estimates; simplified facets; demo and docs feat(worker): progressive streaming analysis with converging estimates Dec 11, 2025
…ERVAL_MS from engine.js and replace inline epsilon usage

Signed-off-by: Lars Trieloff <[email protected]>
…js and remove inline 1e-9

Signed-off-by: Lars Trieloff <[email protected]>
…VANCE; remove magic numbers; reuse DataChunks; make auto-advance interval configurable; add onError; bound loadedSlices via maxSlices; add restart mutex/seq

Signed-off-by: Lars Trieloff <[email protected]>
…dd fake-worker browser tests for onError and maxSlices replay bound

Signed-off-by: Lars Trieloff <[email protected]>
…n-import to 2.32.0; regenerate lock for npm ci

Signed-off-by: Lars Trieloff <[email protected]>
… worker to merge shards in-worker; wrapper supports shards with in-worker merging and prefers mergeable quantiles

Signed-off-by: Lars Trieloff <[email protected]>
…s orchestrator and displays shard count in UI

Signed-off-by: Lars Trieloff <[email protected]>
…s it to stream:add/phase/end to avoid 'no streaming run' errors; resolves demo parallel errors

Signed-off-by: Lars Trieloff <[email protected]>
… close previous run on restart to avoid orphaned workers

Signed-off-by: Lars Trieloff <[email protected]>
…to prevent duplicate stream:init and runaway worker spawning; eliminates phantom second run with 0 ingestion

Signed-off-by: Lars Trieloff <[email protected]>
…s pending timers on Done to prevent late finalize from spawning a new run

Signed-off-by: Lars Trieloff <[email protected]>
…to avoid flakiness on earliest frames; CI-friendly

Signed-off-by: Lars Trieloff <[email protected]>
…h COOP/COEP devServer config

Signed-off-by: Lars Trieloff <[email protected]>
…ium can spawn nested workers

Signed-off-by: Lars Trieloff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants