Skip to content

Commit d80f18a

Browse files
authored
perf: Reduce GC allocations in DOM diffing (#58)
* perf: Defer OTel CDN loading to after first paint This optimization improves First Paint performance by: 1. Installing the lightweight OTel shim synchronously (no network dependency) 2. Deferring CDN-based OTel SDK loading to requestIdleCallback/setTimeout 3. Never blocking the critical path (dotnet.create() -> runMain()) The shim provides full tracing functionality during startup, and the CDN upgrade happens transparently in the background after first paint. Key changes: - Extract initLocalOtelShim() as a named synchronous function - Extract upgradeToFullOtel() as the async CDN loading function - Add scheduleDeferredOtelUpgrade() to run after app initialization - Remove the blocking async IIFE that ran at module load Performance impact: - Before: ~4800ms First Paint (OTel CDN loading blocked startup) - After: ~100ms First Paint (OTel loads in background) Fixes #3 in Performance Optimization Plan * fix: Unwrap memo nodes in MoveChild patch generation When generating MoveChild patches for keyed diffing, the code was comparing node types without first unwrapping memo nodes. This caused incorrect type comparisons when memoized elements were involved in reordering operations. Added UnwrapMemoNode() calls before type checking to ensure we compare the actual underlying Element types, not the memo wrapper types. * ci: Increase benchmark threshold to 15% for CI variance CI runner benchmarks show up to 20% variance in confidence intervals due to: - GC timing differences between runs - Shared infrastructure resource contention - Complex benchmarks (larger allocations) showing more variance than simple ones Increased threshold from 110% to 115% to reduce false positives while still catching genuine regressions. Local benchmarks confirmed variance patterns: - CreateButtonWithHandler: ±20.30% CI - CreateInputWithMultipleHandlers: ±19.42% CI * perf: Defer OTel CDN loading to after first paint Moved OpenTelemetry SDK loading from blocking script execution to requestIdleCallback (with setTimeout fallback). This ensures: - First paint is not blocked by CDN latency - OTel loads during browser idle time after initial render - Graceful degradation if CDN is slow or unavailable The shim ensures all tracing calls work immediately, with real implementation hydrated asynchronously after first paint. * fix: Address OTel review comments for PR #57 Review fixes from copilot-pull-request-reviewer: 1. Early return if isOtelDisabled in initLocalOtelShim() to respect global disable switches and avoid unnecessary shim overhead 2. Expanded fetch ignore condition to cover: - OTLP proxy endpoint (/otlp/v1/traces) - Common collector endpoints (/v1/traces) - Custom configured exporter URL - Blazor framework downloads (/_framework/) 3. Restore original fetch before registering full OTel instrumentations to prevent double-patching and context propagation issues 4. Fix setVerbosity cache invalidation - both shim and full OTel now call resetVerbosityCache() so runtime verbosity changes take effect 5. Fix header guard that always evaluated to true (i && i.headers) * Cache Playwright browsers in CI workflow Add caching for Playwright browsers to improve CI performance. * perf: Reduce GC allocations in DOM diffing - Pool PatchData lists using ConcurrentQueue to avoid allocations in ApplyBatch - Replace ComputeLIS array allocations with ArrayPool<int>.Shared rentals - Replace HashSet<int> for LIS membership with ArrayPool<bool>.Shared rental Benchmark impact (js-framework-benchmark): - Clear (09_clear1k): 173.2ms → 159.6ms (8% improvement) - Clear GC time: 18.1% → 12.2% (33% reduction) - Swap GC time: 10.4% → 9.4% (10% reduction) Also documents the dotnet format multi-targeting issue in memory.instructions.md
1 parent 671652a commit d80f18a

File tree

2 files changed

+173
-67
lines changed

2 files changed

+173
-67
lines changed

.github/instructions/memory.instructions.md

Lines changed: 73 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,20 @@ The following Toub-inspired optimizations have been applied:
8080

8181
**Applied to**: Abies.Conduit, Abies.Counter, Abies.Presentation, Abies.SubscriptionsDemo
8282

83+
### dotnet format - Multi-Targeted Solution Issues
84+
85+
**Problem**: Running `dotnet format` on the solution creates merge conflict markers in some files (e.g., `Parser.cs`) due to the multi-targeted nature of the solution (net10.0 and potentially other targets).
86+
87+
**Workaround**:
88+
- **DO NOT** run `dotnet format` on the entire solution
89+
- Instead, manually format only the files you changed
90+
- If you accidentally run `dotnet format` and it corrupts files, revert with:
91+
```bash
92+
git checkout -- <corrupted-files>
93+
```
94+
95+
**Key Insight**: The formatter gets confused by multi-targeting and inserts erroneous merge conflict markers like `<<<<<<< TODO: Unmerged change from project 'Abies(net10.0)'`.
96+
8397
## js-framework-benchmark (Official Performance Testing)
8498

8599
### Setup
@@ -170,11 +184,65 @@ For reference, compare against:
170184
- `vanillajs-keyed` - Baseline (raw DOM manipulation)
171185
- `blazor-wasm-keyed` - .NET Blazor WASM (similar tech stack)
172186

173-
### Benchmark Results (Feb 2025)
187+
### Latest Benchmark Results (2026-02-09)
188+
189+
**Abies v1.0.151 vs Blazor WASM v10.0.0:**
190+
191+
| Benchmark | Abies | Blazor | Ratio | Notes |
192+
|-----------|-------|--------|-------|-------|
193+
| 01_run1k | 103.2ms | 87.6ms | 1.17x | Create 1000 rows |
194+
| 02_replace1k | 132.9ms | 102.4ms | 1.29x | Replace all rows |
195+
| 03_update10th | 132.1ms | 94.7ms | 1.39x | Update every 10th |
196+
| 04_select1k | 115.2ms | 82.9ms | 1.39x | Select a row |
197+
| **05_swap1k** | **326.6ms** | **94.2ms** | **3.47x** | **⚠️ CRITICAL** (was 328.8ms) |
198+
| 06_remove-one | 65.3ms | 55.6ms | 1.17x | Remove one row |
199+
| 07_create10k | 924.8ms | 810.7ms | 1.14x | Create 10k rows |
200+
| 08_append1k | 135.8ms | 102.9ms | 1.31x | Append 1k rows |
201+
| **09_clear1k** | **159.6ms** | **44.6ms** | **3.58x** | **⚠️ CRITICAL** (was 173.2ms, -8%) |
202+
203+
**Allocation Optimization (2026-02-09):**
204+
Applied the following optimizations to reduce GC pressure:
205+
1. `ComputeLISInto` - Uses ArrayPool for `result` and `p` arrays instead of allocating new arrays
206+
2. `inLIS` bool array - Replaced `HashSet<int>` with `ArrayPool<bool>.Shared` for LIS membership
207+
3. `PatchDataList` pooling - Added `_patchDataListPool` to reuse List<PatchData> in ApplyBatch
208+
209+
**GC Impact:**
210+
- Swap benchmark GC: 10.4% → 9.4% (10% reduction)
211+
- Clear benchmark GC: 18.1% → 12.2% (33% reduction)
212+
- Clear benchmark time: 173.2ms → 159.6ms (8% improvement)
213+
214+
**Remaining Hotspots:**
215+
- Patch records (MoveChild, RemoveChild, etc.) are allocated on every diff
216+
- PatchData records are created for JSON serialization
217+
- JSON serialization itself allocates strings
218+
219+
**Key Findings:**
220+
1. **Swap (05_swap1k)** is 3.47x slower - the LIS algorithm is optimal, but overhead comes from:
221+
- Building key maps for ALL 1000 children
222+
- Evaluating lazy memo nodes
223+
- JSON serialization of patches
224+
225+
2. **Clear (09_clear1k)** is 3.58x slower - improved by 8% with allocation optimizations
226+
227+
**Size Comparison:**
228+
- Abies compressed: 1,225 KB
229+
- Abies uncompressed: 3,938 KB
230+
- First paint: 4,811ms
231+
232+
### Benchmark Command Reference
233+
234+
```bash
235+
# Run full benchmark suite for Abies
236+
cd /path/to/js-framework-benchmark-fork
237+
npm run bench -- --headless keyed/abies
238+
239+
# Run specific benchmarks only
240+
npm run bench -- --headless keyed/abies --benchmark 05_swap1k
241+
242+
# Run Blazor for comparison
243+
npm run bench -- --headless keyed/blazor-wasm
244+
```
174245

175-
| Benchmark | Abies | Blazor | VanillaJS |
176-
|-----------|-------|--------|-----------|
177-
| 05_swap1k | 406.7ms | 94.4ms | 32.2ms |
246+
````
178247
179-
**Note**: Abies is ~4.3x slower than Blazor on swap due to O(n) diffing overhead, but the LIS algorithm is optimal (2 DOM ops vs 2000 with naive approach).
180248

Abies/DOM/Operations.cs

Lines changed: 100 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -754,6 +754,7 @@ private static string GetIndexString(int index) =>
754754
private static readonly ConcurrentQueue<Dictionary<string, int>> _keyIndexMapPool = new();
755755
private static readonly ConcurrentQueue<List<int>> _intListPool = new();
756756
private static readonly ConcurrentQueue<List<(int, int)>> _intPairListPool = new();
757+
private static readonly ConcurrentQueue<List<PatchData>> _patchDataListPool = new();
757758

758759
private static List<Patch> RentPatchList()
759760
{
@@ -845,6 +846,24 @@ private static void ReturnIntPairList(List<(int, int)> list)
845846
}
846847
}
847848

849+
private static List<PatchData> RentPatchDataList()
850+
{
851+
if (_patchDataListPool.TryDequeue(out var list))
852+
{
853+
list.Clear();
854+
return list;
855+
}
856+
return [];
857+
}
858+
859+
private static void ReturnPatchDataList(List<PatchData> list)
860+
{
861+
if (list.Count < 1000) // Prevent memory bloat
862+
{
863+
_patchDataListPool.Enqueue(list);
864+
}
865+
}
866+
848867
/// <summary>
849868
/// Apply a patch to the real DOM by invoking JavaScript interop.
850869
/// </summary>
@@ -979,16 +998,24 @@ public static async Task ApplyBatch(List<Patch> patches)
979998
}
980999
}
9811000

982-
// Step 2: Convert all patches to JSON-serializable format
983-
var patchDataList = new List<PatchData>(patches.Count);
984-
foreach (var patch in patches)
1001+
// Step 2: Convert all patches to JSON-serializable format - use pooled list
1002+
var patchDataList = RentPatchDataList();
1003+
try
9851004
{
986-
patchDataList.Add(ConvertToPatchData(patch));
987-
}
1005+
patchDataList.EnsureCapacity(patches.Count);
1006+
foreach (var patch in patches)
1007+
{
1008+
patchDataList.Add(ConvertToPatchData(patch));
1009+
}
9881010

989-
// Step 3: Apply all patches in a single JS interop call
990-
var json = System.Text.Json.JsonSerializer.Serialize(patchDataList, AbiesJsonContext.Default.ListPatchData);
991-
await Interop.ApplyPatches(json);
1011+
// Step 3: Apply all patches in a single JS interop call
1012+
var json = System.Text.Json.JsonSerializer.Serialize(patchDataList, AbiesJsonContext.Default.ListPatchData);
1013+
await Interop.ApplyPatches(json);
1014+
}
1015+
finally
1016+
{
1017+
ReturnPatchDataList(patchDataList);
1018+
}
9921019

9931020
// Step 4: Post-process - unregister old handlers AFTER DOM changes
9941021
foreach (var patch in patches)
@@ -1551,6 +1578,8 @@ private static void DiffChildrenCore(
15511578

15521579
// Build sequence: for each position in newChildren, get the old index
15531580
var oldIndices = ArrayPool<int>.Shared.Rent(newLength);
1581+
// Rent a bool array instead of allocating HashSet<int>
1582+
var inLIS = ArrayPool<bool>.Shared.Rent(newLength);
15541583
try
15551584
{
15561585
for (int i = 0; i < newLength; i++)
@@ -1559,10 +1588,9 @@ private static void DiffChildrenCore(
15591588
}
15601589

15611590
// Find LIS of old indices - elements in LIS are already in correct relative order
1562-
var lisIndices = ComputeLIS(oldIndices.AsSpan(0, newLength));
1563-
1564-
// Create a set of new positions that are in the LIS (don't need moving)
1565-
var inLIS = new HashSet<int>(lisIndices);
1591+
// The indices returned are positions in oldIndices that form the LIS
1592+
// We mark those positions as "in LIS" (don't need moving)
1593+
ComputeLISInto(oldIndices.AsSpan(0, newLength), inLIS.AsSpan(0, newLength));
15661594

15671595
// First, diff all elements (they all exist in both old and new)
15681596
for (int i = 0; i < newLength; i++)
@@ -1576,7 +1604,7 @@ private static void DiffChildrenCore(
15761604
// IMPORTANT: We must use OLD element IDs since those are what exist in the DOM
15771605
for (int i = newLength - 1; i >= 0; i--)
15781606
{
1579-
if (!inLIS.Contains(i))
1607+
if (!inLIS[i])
15801608
{
15811609
// This element needs to be moved
15821610
// Use OLD element since it has the ID currently in the DOM
@@ -1605,6 +1633,9 @@ private static void DiffChildrenCore(
16051633
}
16061634
finally
16071635
{
1636+
// Clear the bool array before returning (avoid stale data on reuse)
1637+
Array.Clear(inLIS, 0, newLength);
1638+
ArrayPool<bool>.Shared.Return(inLIS);
16081639
ArrayPool<int>.Shared.Return(oldIndices);
16091640
}
16101641
return;
@@ -1767,81 +1798,88 @@ private static bool AreKeysSameSet(ReadOnlySpan<string> oldKeys, Dictionary<stri
17671798
}
17681799

17691800
/// <summary>
1770-
/// Computes the Longest Increasing Subsequence (LIS) of the input array.
1771-
/// Returns the indices in the input array that form the LIS.
1801+
/// Computes the Longest Increasing Subsequence (LIS) of the input array and marks
1802+
/// the positions that are in the LIS in the output bool span.
17721803
/// Used for optimal DOM reordering - elements in the LIS don't need to be moved.
1773-
///
1804+
///
17741805
/// Algorithm: O(n log n) using binary search with patience sorting.
17751806
/// Inspired by Inferno's virtual DOM implementation.
1807+
///
1808+
/// This version uses ArrayPool to avoid allocations on the hot path.
17761809
/// </summary>
17771810
/// <param name="arr">Array of old indices in new order.</param>
1778-
/// <returns>Indices in the input array that form the LIS.</returns>
1779-
private static int[] ComputeLIS(ReadOnlySpan<int> arr)
1811+
/// <param name="inLIS">Output span where inLIS[i] = true if position i is in the LIS.</param>
1812+
private static void ComputeLISInto(ReadOnlySpan<int> arr, Span<bool> inLIS)
17801813
{
17811814
var len = arr.Length;
17821815
if (len == 0)
17831816
{
1784-
return [];
1817+
return;
17851818
}
17861819

1787-
// result[i] = index in arr of smallest ending element of LIS of length i+1
1788-
var result = new int[len];
1789-
// p[i] = predecessor index in arr for element at arr[i] in the LIS
1790-
var p = new int[len];
1791-
var k = 0; // Length of longest LIS found - 1
1820+
// Rent pooled arrays to avoid allocations
1821+
var result = ArrayPool<int>.Shared.Rent(len);
1822+
var p = ArrayPool<int>.Shared.Rent(len);
17921823

1793-
for (int i = 0; i < len; i++)
1824+
try
17941825
{
1795-
var arrI = arr[i];
1826+
var k = 0; // Length of longest LIS found - 1
17961827

1797-
// Binary search for position to insert arrI
1798-
if (k > 0 && arr[result[k]] < arrI)
1799-
{
1800-
// arrI extends the longest LIS
1801-
p[i] = result[k];
1802-
result[++k] = i;
1803-
}
1804-
else
1828+
for (int i = 0; i < len; i++)
18051829
{
1806-
// Binary search to find the smallest LIS ending value >= arrI
1807-
int lo = 0, hi = k;
1808-
while (lo < hi)
1830+
var arrI = arr[i];
1831+
1832+
// Binary search for position to insert arrI
1833+
if (k > 0 && arr[result[k]] < arrI)
18091834
{
1810-
var mid = (lo + hi) >> 1;
1811-
if (arr[result[mid]] < arrI)
1835+
// arrI extends the longest LIS
1836+
p[i] = result[k];
1837+
result[++k] = i;
1838+
}
1839+
else
1840+
{
1841+
// Binary search to find the smallest LIS ending value >= arrI
1842+
int lo = 0, hi = k;
1843+
while (lo < hi)
1844+
{
1845+
var mid = (lo + hi) >> 1;
1846+
if (arr[result[mid]] < arrI)
1847+
{
1848+
lo = mid + 1;
1849+
}
1850+
else
1851+
{
1852+
hi = mid;
1853+
}
1854+
}
1855+
1856+
// Update result and predecessor
1857+
if (lo > 0)
18121858
{
1813-
lo = mid + 1;
1859+
p[i] = result[lo - 1];
18141860
}
1815-
else
1861+
result[lo] = i;
1862+
if (lo > k)
18161863
{
1817-
hi = mid;
1864+
k = lo;
18181865
}
18191866
}
1867+
}
18201868

1821-
// Update result and predecessor
1822-
if (lo > 0)
1823-
{
1824-
p[i] = result[lo - 1];
1825-
}
1826-
result[lo] = i;
1827-
if (lo > k)
1828-
{
1829-
k = lo;
1830-
}
1869+
// Mark LIS positions by following predecessor chain
1870+
// Instead of building an array, we directly mark the bool span
1871+
var idx = result[k];
1872+
for (int i = k; i >= 0; i--)
1873+
{
1874+
inLIS[idx] = true;
1875+
idx = p[idx];
18311876
}
18321877
}
1833-
1834-
// Reconstruct LIS by following predecessor chain
1835-
var lisLength = k + 1;
1836-
var lis = new int[lisLength];
1837-
var idx = result[k];
1838-
for (int i = lisLength - 1; i >= 0; i--)
1878+
finally
18391879
{
1840-
lis[i] = idx;
1841-
idx = p[idx];
1880+
ArrayPool<int>.Shared.Return(result);
1881+
ArrayPool<int>.Shared.Return(p);
18421882
}
1843-
1844-
return lis;
18451883
}
18461884

18471885
/// <summary>

0 commit comments

Comments
 (0)