[Explore] Improve performance with large query result#11390
Conversation
Signed-off-by: Joey Liu <jiyili@amazon.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11390 +/- ##
==========================================
- Coverage 60.30% 60.30% -0.01%
==========================================
Files 4664 4664
Lines 130372 130381 +9
Branches 22233 22238 +5
==========================================
+ Hits 78618 78621 +3
Misses 46147 46147
- Partials 5607 5613 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
…ject#11390) * [Explore] Improve performance with large query result Signed-off-by: Joey Liu <jiyili@amazon.com> * Changeset file for PR opensearch-project#11390 created/updated * Fix tests Signed-off-by: Joey Liu <jiyili@amazon.com> --------- Signed-off-by: Joey Liu <jiyili@amazon.com> Co-authored-by: opensearch-changeset-bot[bot] <154024398+opensearch-changeset-bot[bot]@users.noreply.github.com> Signed-off-by: Mark Boyd <mark.boyd@gsa.gov>
Description
Querying indexes with many fields (e.g. 500+ fields, 10,000 rows) caused severe performance degradation — 20+ seconds of blocking JavaScript after the network response was received. Profiling identified three independent bottlenecks:
Root causes and fixes
dompurify.sanitizecalled on every field on every renderThe source column cell called
dataset.formatHit(row)(HTML mode) and thendompurify.sanitize()on every field value on every render. The HTML formatter wraps all values in tags unconditionally, so sanitization was always triggered — including for plain integers and strings that contain no HTML.dataset.formatHit(row, 'text')which returns plain strings viatextConvert, bypassing HTML generation entirelydangerouslySetInnerHTMLReact.memoto skip re-renders when props are unchangeddetectAndSetOptimalTabcallednormalizeResultRows(hits, fieldSchema)to determine if results can be visualized. This function iterated over all rows for every field to computevalidValuesCountanduniqueValuesCount— creating ~27M operations for 10,000 rows × 676 fields, plus significant GC pressure from allocating thousands of intermediate objects and Sets.canResultsBeVisualizedonly needs to know column types, which are already available fromfieldSchemaalone — no row data is needed. Rewrite to classify columns fromfieldSchemain a single O(fields) pass, usingrowCountas a conservative upper bound foruniqueValuesCount.ImmutableStateInvariantMiddlewareandSerializableStateInvariantMiddleware(enabled by default in development) recursively walk the entire Redux state tree before and after every dispatch to detect mutations. With large query results stored instate.results, every user interaction that dispatches an action (clicking a button, toggling a setting, expanding a row) paid the full traversal cost over the stored hits.The results slice is always fully replaced via
setResults— it is never partially mutated — so these checks provide no safety value there. Exclude results from both middleware checks viaignoredPaths.Notes
canResultsBeVisualizedfix usesrowCountas an upper bound foruniqueValuesCount. This is a safe approximation: rules requiringuniqueValuesCount >= Nwill pass when there are enough rows, which is the correct optimistic behaviour for tab selection. The actual visualization rendering still receives accurate stats via normalizeResultRows when it builds the chart.ignoredPathsfix applies in development mode only; these middleware checks are stripped in production builds.Issues Resolved
Screenshot
Testing the changes
Changelog
Check List
yarn test:jestyarn test:jest_integration