[POC] xy chart decimation#2805
Draft
lukeelmers wants to merge 3 commits into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This POC set out to test whether decimation on xy charts could significantly improve rendering performance for large time series data. The short answer: decimation works as expected, but the client-side data pipeline is a bigger bottleneck. The most valuable outcome of this work is an instrumented breakdown that quantifies exactly where time is spent, and suggested we explore some of the data pipeline restructuring proposed in #2561.
What was observed
Using an instrumented benchmark that isolates each stage of the geometry computation pipeline, this data was taken using 1M points on a 1920×1080 panel:
computeSeriesDomainsSelector)The data pipeline — series splitting, gap filling, sorting, stacking, domain computation — dominates at every scale tested. Decimation reduces the rendering step to a constant cost (bounded by pixel width). Because the pipeline dwarfs both, the overall improvement from decimation alone is marginal... however it will become a more significant factor if the pipeline cost is able to be reduced.
What's included
Decimation implementation: An M4 decimation algorithm that divides ordered series data into pixel-width buckets, retaining up to 4 values per bucket: the first, minimum y1, maximum y1, and last. Compared to pure min/max bucketing, this also preserves the entry and exit points of each pixel column, producing a more accurate line shape. The algorithm is a single O(n) pass that activates when the point count exceeds
DECIMATION_THRESHOLD_FACTOR * panelWidth(currently4, configurable as a named constant indecimation.ts). It's injected intorenderGeometriesfor line and area series, and because the decimated data flows through torenderPoints, the spatial index is also built from the reduced dataset.Benchmark story: A Storybook story at Test Cases > Decimation Benchmark that generates time series data at configurable sizes (100K–10M) and displays render time.
Pipeline breakdown benchmark: A jest-based benchmark that separately measures the data pipeline, geometry rendering, and decimation scan to show where time is actually spent.
How to test
Start Storybook and navigate to Test Cases > Decimation Benchmark:
git checkout poc/decimation yarn install yarn start # Open http://localhost:9001To run the instrumented pipeline breakdown:
Learnings
Server-side downsampling is our largest opportunity. Regardless of the client-side optimizations explored here, the single biggest performance improvement we can make is reducing the data sent over the wire to something more manageable and based on the available pixel space. This should be our top priority.
Data pipeline restructuring (#2561) is our largest client-side opportunity. Reducing redundant scans, simplifying series grouping, and optimizing for common chart types (e.g., single time series without stacking) would have the largest impact on rendering times... but the most dramatic end-to-end gains will still come from server-side downsampling.
Decimation can be easily layered in alongside or after pipeline work. It's a small change (~40 lines) that caps the post-decimation rendering cost at any scale. Once the pipeline is faster, decimation prevents rendering from becoming the next bottleneck — especially at very large data volumes (5M+) where even browser-accelerated Canvas would slow down.