-
Notifications
You must be signed in to change notification settings - Fork 2
feat(worker): progressive streaming analysis with converging estimates #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: prerelease
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces incremental progressive analysis for RUM data processing via a Web Worker implementation with streaming results at configurable sampling phases (e.g., 12% → 25% → 50% → 100%). The system uses stable hashing for consistent sampling tiers, online quantile estimation (P²), and space-saving algorithms for Top-K facets. Early phases provide approximate analytics that converge to exact results at 100%, enabling responsive UI updates during large dataset analysis.
Key changes:
- Worker-based architecture: New
worker/directory with session client, worker entry point, and progressive/engine implementations - Streaming algorithms: P² quantile estimator and Space-Saving Top-K tracker in
src/quantiles/andsrc/topk/ - Enhanced demo and testing: Browser-based demo with progress tracking, Node and browser test suites using web-test-runner
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
worker/session.js |
Browser client for communicating with analysis worker via message passing |
worker/analysis.worker.js |
Web Worker entry point handling init, load, compute, and cancel commands |
worker/progressive.js |
Incremental progressive analysis with O(delta) per-phase complexity |
worker/engine.js |
Progressive analysis engine for sampling, aggregation, and facet computation |
src/hash.js |
FNV-1a hash implementation for stable bundle sampling |
src/quantiles/p2.js |
P² online quantile estimator for approximate percentiles |
src/topk/space_saving.js |
Space-Saving algorithm for weighted Top-K heavy hitters |
tools/serve-demo.mjs |
Static HTTP server for demo with path traversal protection |
tools/open-demo.mjs |
CLI tool to open demo in browser |
web-test-runner.config.mjs |
Browser test configuration using Puppeteer |
test/engine.progressive.test.js |
Node unit tests for progressive engine |
test-browser/worker.progressive.test.js |
Browser integration tests for worker streaming |
test-browser/demo.html |
Interactive demo with Zipf-distributed synthetic data |
package.json |
New dependencies (@web/test-runner, puppeteer) and demo/test scripts |
README.md |
Documentation for progressive worker API and totals vs estimates |
Comments suppressed due to low confidence (2)
worker/analysis.worker.js:24
- Unused function yieldToEventLoop.
function yieldToEventLoop() { return new Promise((r) => setTimeout(r, 0)); }
worker/progressive.js:29
- Superfluous argument passed to function membership.
this.items[i] = { b, h: membership(b, keyForBundle(b)) };
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9c825e7 to
de9cecb
Compare
…ngDataChunks); demo; docs; exports Signed-off-by: Lars Trieloff <[email protected]>
de9cecb to
372d86f
Compare
…error handling; keep API compatible Signed-off-by: Lars Trieloff <[email protected]>
… errors to inflight Signed-off-by: Lars Trieloff <[email protected]>
…on + client shim; update demo/tests/docs accordingly Signed-off-by: Lars Trieloff <[email protected]>
…repo config unchanged; limit lint scope to PR files Signed-off-by: Lars Trieloff <[email protected]>
… issues post-merge Signed-off-by: Lars Trieloff <[email protected]>
…ERVAL_MS from engine.js and replace inline epsilon usage Signed-off-by: Lars Trieloff <[email protected]>
…js and remove inline 1e-9 Signed-off-by: Lars Trieloff <[email protected]>
…VANCE; remove magic numbers; reuse DataChunks; make auto-advance interval configurable; add onError; bound loadedSlices via maxSlices; add restart mutex/seq Signed-off-by: Lars Trieloff <[email protected]>
…timates semantics Signed-off-by: Lars Trieloff <[email protected]>
…dd fake-worker browser tests for onError and maxSlices replay bound Signed-off-by: Lars Trieloff <[email protected]>
…n-import to 2.32.0; regenerate lock for npm ci Signed-off-by: Lars Trieloff <[email protected]>
…lt shape Signed-off-by: Lars Trieloff <[email protected]>
…ops) Signed-off-by: Lars Trieloff <[email protected]>
… worker to merge shards in-worker; wrapper supports shards with in-worker merging and prefers mergeable quantiles Signed-off-by: Lars Trieloff <[email protected]>
…ions Signed-off-by: Lars Trieloff <[email protected]>
Signed-off-by: Lars Trieloff <[email protected]>
…s orchestrator and displays shard count in UI Signed-off-by: Lars Trieloff <[email protected]>
…s it to stream:add/phase/end to avoid 'no streaming run' errors; resolves demo parallel errors Signed-off-by: Lars Trieloff <[email protected]>
…arallelism Signed-off-by: Lars Trieloff <[email protected]>
… close previous run on restart to avoid orphaned workers Signed-off-by: Lars Trieloff <[email protected]>
…to prevent duplicate stream:init and runaway worker spawning; eliminates phantom second run with 0 ingestion Signed-off-by: Lars Trieloff <[email protected]>
…s pending timers on Done to prevent late finalize from spawning a new run Signed-off-by: Lars Trieloff <[email protected]>
…to avoid flakiness on earliest frames; CI-friendly Signed-off-by: Lars Trieloff <[email protected]>
…h COOP/COEP devServer config Signed-off-by: Lars Trieloff <[email protected]>
…ium can spawn nested workers Signed-off-by: Lars Trieloff <[email protected]>
Summary
Offload RUM analysis to a Web Worker with progressive phases and streaming ingestion, delivering converging estimates in real-time while data loads.
Key Features
{ value, count, weight }shape—scaled estimates when incomplete, exact at 100%User-Facing Changes
New API:
StreamingDataChunksSnapshot Shape
phaseprogresstotals[series]{ count, sum, min, max, mean }— scaled when incompletesampleTotalsprogress < 1quantiles[series]facets[name]{ value, count, weight }— scaled when incompleteingestion{ received, expected, coverage }Custom Facets/Series (ESM Modules)
Architecture
Engine Classes
ProgressiveRunworker/progressive.jsadvanceTo()StreamingRunworker/streaming.jsP2Quantilesrc/quantiles/p2.jsSpaceSavingsrc/topk/space_saving.jsEstimation Model
f = phasef = phase × coverage(coverage = received/expected)totals.countandtotals.sumscaled by1/fwhen incompletef = 1, then exact sort-based calculationFiles Changed
worker/streaming.jsStreamingRunclass +createStreamingDataChunks()wrapperworker/progressive.jsProgressiveRunclass for O(Δ) phase advancementworker/engine.jsmembership(),sampleChunksAt(),exactQuantilesFromValues()worker/analysis.worker.jsstream:*commandssrc/hash.jssrc/quantiles/p2.jssrc/topk/space_saving.jsREADME.mdpackage.jsonnpm run demoscripttest/browser/demo.htmltest/browser/worker.progressive.test.jstest/engine.progressive.test.jsTesting
Demo Features
demo-mods.jsHow to Review
npm run demoand observe phase/coverage interactionRelated