Skip to content

[Explore] Improve performance with large query result#11390

Merged
Maosaic merged 3 commits into
opensearch-project:mainfrom
Maosaic:performance
Feb 27, 2026
Merged

[Explore] Improve performance with large query result#11390
Maosaic merged 3 commits into
opensearch-project:mainfrom
Maosaic:performance

Conversation

@Maosaic

@Maosaic Maosaic commented Feb 25, 2026

Copy link
Copy Markdown
Collaborator

Description

Querying indexes with many fields (e.g. 500+ fields, 10,000 rows) caused severe performance degradation — 20+ seconds of blocking JavaScript after the network response was received. Profiling identified three independent bottlenecks:

Root causes and fixes

  1. SourceFieldTableCell: dompurify.sanitize called on every field on every render

The source column cell called dataset.formatHit(row) (HTML mode) and then dompurify.sanitize() on every field value on every render. The HTML formatter wraps all values in tags unconditionally, so sanitization was always triggered — including for plain integers and strings that contain no HTML.

  • Switch to dataset.formatHit(row, 'text') which returns plain strings via textConvert, bypassing HTML generation entirely
  • Render values as React text children instead of dangerouslySetInnerHTML
  • Remove the dompurify dependency from this component
  • Wrap component in React.memo to skip re-renders when props are unchanged
  1. canResultsBeVisualized: O(rows × fields) computation on every query

detectAndSetOptimalTab called normalizeResultRows(hits, fieldSchema) to determine if results can be visualized. This function iterated over all rows for every field to compute validValuesCount and uniqueValuesCount — creating ~27M operations for 10,000 rows × 676 fields, plus significant GC pressure from allocating thousands of intermediate objects and Sets.

canResultsBeVisualized only needs to know column types, which are already available from fieldSchema alone — no row data is needed. Rewrite to classify columns from fieldSchema in a single O(fields) pass, using rowCount as a conservative upper bound for uniqueValuesCount.

  1. Redux dev-mode middleware traversing the entire results state on every dispatch

ImmutableStateInvariantMiddleware and SerializableStateInvariantMiddleware (enabled by default in development) recursively walk the entire Redux state tree before and after every dispatch to detect mutations. With large query results stored in state.results, every user interaction that dispatches an action (clicking a button, toggling a setting, expanding a row) paid the full traversal cost over the stored hits.

The results slice is always fully replaced via setResults — it is never partially mutated — so these checks provide no safety value there. Exclude results from both middleware checks via ignoredPaths.

Notes

  • The canResultsBeVisualized fix uses rowCount as an upper bound for uniqueValuesCount. This is a safe approximation: rules requiring uniqueValuesCount >= N will pass when there are enough rows, which is the correct optimistic behaviour for tab selection. The actual visualization rendering still receives accurate stats via normalizeResultRows when it builds the chart.
  • The ignoredPaths fix applies in development mode only; these middleware checks are stripped in production builds.
  • The underlying architectural issue — storing raw OpenSearch hits in Redux — remains and should be addressed separately by moving result data to a module-level cache, keeping only lightweight metadata (hit count, status, elapsed time) in the Redux store.

Issues Resolved

Screenshot

Testing the changes

Changelog

  • chore: Improve performance with large query result

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

Signed-off-by: Joey Liu <jiyili@amazon.com>
@codecov

codecov Bot commented Feb 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.77778% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.30%. Comparing base (d7f6b04) to head (bb39a84).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...t/actions/detect_optimal_tab/detect_optimal_tab.ts 69.23% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11390      +/-   ##
==========================================
- Coverage   60.30%   60.30%   -0.01%     
==========================================
  Files        4664     4664              
  Lines      130372   130381       +9     
  Branches    22233    22238       +5     
==========================================
+ Hits        78618    78621       +3     
  Misses      46147    46147              
- Partials     5607     5613       +6     
Flag Coverage Δ
Linux_1 24.80% <0.00%> (-0.01%) ⬇️
Linux_2 38.36% <ø> (ø)
Linux_3 40.20% <ø> (-0.01%) ⬇️
Linux_4 33.61% <77.77%> (+<0.01%) ⬆️
Windows_1 24.82% <0.00%> (-0.01%) ⬇️
Windows_2 38.34% <ø> (ø)
Windows_3 40.21% <ø> (+<0.01%) ⬆️
Windows_4 33.62% <77.77%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Joey Liu <jiyili@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Maintain some immutability protection

Disabling immutability checks on the results slice removes important safeguards
against accidental mutations. While the comment states results are "always fully
replaced," this creates a maintenance risk if future code inadvertently mutates
results. Consider using Immer's enableMapSet() or restructuring to maintain checks.

src/plugins/explore/public/application/utils/state_management/store.ts [74-77]

 const middlewareOptions = {
-  immutableCheck: { ignoredPaths: ['results'] },
-  serializableCheck: { ignoredPaths: ['results'] },
+  immutableCheck: { 
+    ignoredPaths: ['results'],
+    warnAfter: 128  // Still warn on very slow checks
+  },
+  serializableCheck: { 
+    ignoredPaths: ['results'],
+    warnAfter: 128
+  },
 };
Suggestion importance[1-10]: 4

__

Why: Adding warnAfter thresholds is a reasonable middle ground between performance and safety. However, the suggestion doesn't fundamentally change the approach—it still ignores the results path. The improvement is marginal since the PR's detailed comment already explains why these checks are disabled for results.

Low
Possible issue
Incorrect uniqueness assumption for columns

Setting uniqueValuesCount equal to rowCount assumes all values are unique, which may
cause incorrect visualization selection when rules check for low cardinality (e.g.,
categorical fields with few distinct values). Consider using a more conservative
estimate or sampling actual uniqueness from a subset of rows to improve accuracy.

src/plugins/explore/public/application/utils/state_management/actions/detect_optimal_tab/detect_optimal_tab.ts [32-47]

 results.fieldSchema.forEach((field: { type?: string; name?: string }, index: number) => {
   const schema = FIELD_TYPE_MAP[field.type || ''] || VisFieldType.Unknown;
   const column: VisColumn = {
     id: index,
     schema,
     name: field.name || '',
     column: `field-${index}`,
     validValuesCount: rowCount,
-    uniqueValuesCount: rowCount,
+    // Use a conservative estimate: assume at least 2 unique values for non-empty results
+    uniqueValuesCount: Math.min(rowCount, 2),
   };
   if (schema === VisFieldType.Numerical) numericalColumns.push(column);
   else if (schema === VisFieldType.Categorical) categoricalColumns.push(column);
   else if (schema === VisFieldType.Date) dateColumns.push(column);
 });
Suggestion importance[1-10]: 3

__

Why: While the concern about uniqueValuesCount accuracy is valid, the suggested fix of hardcoding Math.min(rowCount, 2) is arbitrary and likely incorrect. The PR comment explicitly states this is a "conservative upper bound" for threshold checks (e.g., >= 7), which will work correctly when sufficient rows exist. The suggestion doesn't provide a meaningfully better solution.

Low

@Maosaic Maosaic merged commit 9f581bf into opensearch-project:main Feb 27, 2026
151 of 157 checks passed
markdboyd pushed a commit to cloud-gov/OpenSearch-Dashboards that referenced this pull request Mar 9, 2026
…ject#11390)

* [Explore] Improve performance with large query result

Signed-off-by: Joey Liu <jiyili@amazon.com>

* Changeset file for PR opensearch-project#11390 created/updated

* Fix tests

Signed-off-by: Joey Liu <jiyili@amazon.com>

---------

Signed-off-by: Joey Liu <jiyili@amazon.com>
Co-authored-by: opensearch-changeset-bot[bot] <154024398+opensearch-changeset-bot[bot]@users.noreply.github.com>
Signed-off-by: Mark Boyd <mark.boyd@gsa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants