additional optimizations #619

quantizor · 2025-10-29T18:53:02Z

This PR builds on top of #547 with a few additional commits. Each change was verified with best-of-3 benchmark runs and rationale is in the commit body.

Results (all best-of-3)

Main branch baseline

Memory Usage Summary:
  init: 1.57 MB heap
  simple: 1016.39 KB heap
  heavy: 1.01 MB heap
  collection with cache: 2.52 MB heap
    Total footprint: 234.75 MB
    Operations: 1322
  collection without cache: 16.99 MB heap
    Total footprint: 235.13 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3136ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,065.35  0.2175   1.8490  0.2460  0.2450   0.4969   0.5517   0.8136  ±1.12%     2033
   · simple                    4,062.11  0.2270   0.7383  0.2462  0.2448   0.4628   0.4756   0.5714  ±0.70%     2032
   · heavy                     3,837.87  0.2374   2.3023  0.2606  0.2597   0.4791   0.5047   0.8659  ±1.06%     1919
   · collection with cache       688.97  1.3773   2.2432  1.4514  1.4654   1.7979   1.9276   2.2432  ±0.68%      345
   · collection without cache    109.09  8.8340  10.2709  9.1664  9.2940  10.2709  10.2709  10.2709  ±0.73%       55

Perf branch

Memory Usage Summary:
  init: 1.22 MB heap
  simple: 838.82 KB heap
  heavy: 647.81 KB heap
  collection with cache: 1.79 MB heap
    Total footprint: 171.70 MB
    Operations: 1322
  collection without cache: 13.72 MB heap
    Total footprint: 238.06 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3130ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,581.74  0.1975   1.3568  0.2183  0.2220   0.3657   0.4523   0.6930  ±0.89%     2291
   · simple                    4,396.86  0.2133   0.5638  0.2274  0.2274   0.4307   0.4576   0.5137  ±0.55%     2199
   · heavy                     4,182.74  0.2232   0.5723  0.2391  0.2393   0.4502   0.4675   0.5242  ±0.58%     2092
   · collection with cache       724.98  1.3007   3.3782  1.3793  1.4092   1.6660   2.2602   3.3782  ±1.02%      363
   · collection without cache    110.72  8.6614  10.1187  9.0318  9.2184  10.1187  10.1187  10.1187  ±1.10%       56

Further optimizations branch

Memory Usage Summary:
  init: 1.29 MB heap
  simple: 684.23 KB heap
  heavy: 651.05 KB heap
  collection with cache: 1.87 MB heap
    Total footprint: 177.91 MB
    Operations: 1322
  collection without cache: 11.69 MB heap
    Total footprint: 237.63 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3117ms
     name                            hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · init                      4,676.21  0.1966  0.9308  0.2138  0.2122  0.4457  0.4768  0.6357  ±0.76%     2339
   · simple                    4,382.74  0.2119  2.8712  0.2282  0.2259  0.4020  0.4154  0.4648  ±1.17%     2192
   · heavy                     4,278.80  0.2224  0.5003  0.2337  0.2330  0.4149  0.4403  0.4645  ±0.50%     2140
   · collection with cache       747.74  1.2559  4.3190  1.3374  1.3366  1.6999  2.1674  4.3190  ±1.34%      374
   · collection without cache    121.55  7.9514  9.3715  8.2273  8.2853  9.3715  9.3715  9.3715  ±0.94%       61

quantizor · 2025-10-29T19:33:14Z

Best of 3 with memory info from #620

main branch

Memory Usage Summary:
  init: 1.57 MB heap
  simple: 1016.39 KB heap
  heavy: 1.01 MB heap
  collection with cache: 2.52 MB heap
    Total footprint: 234.75 MB
    Operations: 1322
  collection without cache: 16.99 MB heap
    Total footprint: 235.13 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3136ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,065.35  0.2175   1.8490  0.2460  0.2450   0.4969   0.5517   0.8136  ±1.12%     2033
   · simple                    4,062.11  0.2270   0.7383  0.2462  0.2448   0.4628   0.4756   0.5714  ±0.70%     2032
   · heavy                     3,837.87  0.2374   2.3023  0.2606  0.2597   0.4791   0.5047   0.8659  ±1.06%     1919
   · collection with cache       688.97  1.3773   2.2432  1.4514  1.4654   1.7979   1.9276   2.2432  ±0.68%      345
   · collection without cache    109.09  8.8340  10.2709  9.1664  9.2940  10.2709  10.2709  10.2709  ±0.73%       55

perf branch

Memory Usage Summary:
  init: 1.22 MB heap
  simple: 838.82 KB heap
  heavy: 647.81 KB heap
  collection with cache: 1.79 MB heap
    Total footprint: 171.70 MB
    Operations: 1322
  collection without cache: 13.72 MB heap
    Total footprint: 238.06 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3130ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,581.74  0.1975   1.3568  0.2183  0.2220   0.3657   0.4523   0.6930  ±0.89%     2291
   · simple                    4,396.86  0.2133   0.5638  0.2274  0.2274   0.4307   0.4576   0.5137  ±0.55%     2199
   · heavy                     4,182.74  0.2232   0.5723  0.2391  0.2393   0.4502   0.4675   0.5242  ±0.58%     2092
   · collection with cache       724.98  1.3007   3.3782  1.3793  1.4092   1.6660   2.2602   3.3782  ±1.02%      363
   · collection without cache    110.72  8.6614  10.1187  9.0318  9.2184  10.1187  10.1187  10.1187  ±1.10%       56

further-improvements branch

Memory Usage Summary:
  init: 1.24 MB heap
  simple: 786.39 KB heap
  heavy: 637.91 KB heap
  collection with cache: 1.59 MB heap
    Total footprint: 200.55 MB
    Operations: 1322
  collection without cache: 10.19 MB heap
    Total footprint: 201.14 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3058ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,339.17  0.2006   5.1748  0.2305  0.2191   0.6207   0.6480   0.7138  ±2.32%     2171
   · simple                    4,653.32  0.2008   1.2895  0.2149  0.2137   0.3652   0.3806   0.4122  ±0.68%     2327
   · heavy                     4,482.96  0.2108   0.5238  0.2231  0.2236   0.3895   0.4034   0.4409  ±0.49%     2242
   · collection with cache       771.76  1.2577   1.5958  1.2957  1.3007   1.5194   1.5574   1.5958  ±0.41%      386
   · collection without cache    118.02  8.0855  10.1479  8.4732  8.5805  10.1479  10.1479  10.1479  ±1.33%       60

This branch brings the overall performance improvement over main to:

simple: 14.5543% faster
heavy: 16.8085% faster
collection with cache: 12.0165% faster
collection without cache: 8.1859% faster

Not bad

dcastil · 2025-11-02T13:45:46Z

Can you rebase the branch on top of the newest main in tailwind-merge? For some reason the diff shows all the changes from #547 which makes it difficult to understand which changes are unique to this PR. Alternatively you can cherry-pick the new commits onto a new branch to fix it.

Most Tailwind CSS classes have zero or one modifier (e.g., 'p-4' or 'hover:bg-red-500'), making this a very common case. The previous implementation always called sortModifiers() and join(), even when no sorting or joining was needed. This optimization: - Avoids function call overhead for modifiers.length === 0 - Avoids array allocation and join overhead for modifiers.length === 1 - Reduces work for the most common path through the code Benchmark results show ~4.2% improvement on 'collection without cache' benchmark (from 105.39 hz to 109.86 hz).

Refactor getGroupRecursive to use index-based traversal instead of array slicing, and eliminate array mutation from shift(). Previous implementation issues: - classParts.slice(1) created a new array on every recursive call, causing O(n) allocations for deep class name lookups - classParts.shift() mutated the array and moved all elements, causing O(n) element movement and potential V8 deoptimization Optimizations: - Added startIndex parameter to getGroupRecursive() to track position without array slicing - Replaced shift() with index offset calculation, eliminating array mutation - Only slice array when building classRest string for validators (less frequent path), and optimize with early check for startIndex === 0 This maintains monomorphic call sites (important for V8 optimization) while significantly reducing memory allocations during class group lookups. Benchmark results show ~1.6% improvement on 'collection without cache' benchmark (when combined with fast path optimization).

Replace localeCompare() calls with direct string comparison for alphabetical sorting of modifiers. Motivation: - localeCompare() performs locale-aware comparison, which involves: - Locale processing overhead - More complex string comparison logic - Potential locale string allocations - Tailwind CSS modifiers are ASCII identifiers (e.g., 'hover', 'focus', 'dark'), making locale-aware comparison unnecessary - Direct comparison (a < b ? -1 : a > b ? 1 : 0) is simpler and faster, leveraging V8's optimized string comparison primitives This change affects modifier sorting when multiple modifiers need to be sorted alphabetically. For modifier arrays with 2+ elements that require sorting, this provides measurable performance improvement. Benchmark results show ~0.4% improvement on 'collection without cache' benchmark (when combined with previous optimizations).

Pre-compute conflict arrays in Maps at initialization time instead of concatenating arrays at runtime on every call to getConflictingClassGroupIds. Architectural improvement: - Build conflictsWithoutPostfix Map for all class groups with conflicts - Build conflictsWithPostfix Map with pre-merged arrays for classes that have both base conflicts and modifier conflicts - Eliminates runtime concatArrays() calls, replacing with O(1) Map lookups This moves work from the hot path (called for every Tailwind class) to initialization time (called once). The concatArrays operation was creating new arrays and copying elements on every conflict check. Benchmark results show ~2.6% improvement on 'collection without cache' (from 114.02 hz to 116.97 hz).

quantizor · 2025-11-02T14:16:14Z

@dcastil all set!

codspeed-hq · 2025-11-02T14:17:18Z

CodSpeed Performance Report

Merging #619 will not alter performance

_{Comparing quantizor:further-improvements (1bafc9c) with main (57372fa)}

Summary

✅ 5 untouched
🆕 2 new

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
🆕	`ultra long class list with many conflicts with cache`	N/A	16.1 ms	N/A
🆕	`ultra long class list with many conflicts without cache`	N/A	16.1 ms	N/A

dcastil

Thanks, these look solid! Also many thanks for the detailed commit descriptions, they were quite helpful.

I think we can remove the conflictsWithoutPostfix and the argument to .sort(), but otherwise it all looks good!

src/lib/class-group-utils.ts

src/lib/sort-modifiers.ts

Remove conflictsWithoutPostfix and conflictsWithPostfix maps and compute conflicts on-the-fly directly from config objects instead.

Add benchmark for ultra long class lists with many conflicts to demonstrate performance characteristics with large class sets.

quantizor · 2025-11-06T21:47:17Z

@dcastil all set!

The optimization provides no benefit since the function is only called with >1 strings, making the array-based approach unnecessary overhead.

dcastil

Looks good, many thanks again! 🚀

github-actions · 2025-11-09T12:34:25Z

This was addressed in release v3.4.0.

github-actions bot added the context-v3 Related to tailwind-merge v3 label Oct 29, 2025

dcastil added the feature Is new feature label Nov 2, 2025

quantizor added 4 commits November 2, 2025 09:15

quantizor force-pushed the further-improvements branch from 246804c to 7831c8e Compare November 2, 2025 14:15

dcastil requested changes Nov 2, 2025

View reviewed changes

src/lib/class-group-utils.ts Outdated Show resolved Hide resolved

src/lib/sort-modifiers.ts Outdated Show resolved Hide resolved

dcastil reviewed Nov 3, 2025

View reviewed changes

src/lib/sort-modifiers.ts Outdated Show resolved Hide resolved

quantizor added 2 commits November 6, 2025 16:31

Remove unnecessary pre-computed conflict maps

87baba3

Remove conflictsWithoutPostfix and conflictsWithPostfix maps and compute conflicts on-the-fly directly from config objects instead.

test: add ultra long class list benchmark

1927858

Add benchmark for ultra long class lists with many conflicts to demonstrate performance characteristics with large class sets.

revert: remove array-based string building optimization

0799c12

The optimization provides no benefit since the function is only called with >1 strings, making the array-based approach unnecessary overhead.

quantizor force-pushed the further-improvements branch from 62a3175 to 0799c12 Compare November 6, 2025 21:47

Make benchmark test names consistent

1bafc9c

dcastil approved these changes Nov 9, 2025

View reviewed changes

dcastil merged commit 75e9aef into dcastil:main Nov 9, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

additional optimizations #619

additional optimizations #619

quantizor commented Oct 29, 2025 •

edited

Loading

Uh oh!

quantizor commented Oct 29, 2025 •

edited

Loading

Uh oh!

dcastil commented Nov 2, 2025

Uh oh!

quantizor commented Nov 2, 2025

Uh oh!

codspeed-hq bot commented Nov 2, 2025 •

edited

Loading

Uh oh!

dcastil left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quantizor commented Nov 6, 2025

Uh oh!

dcastil left a comment

Uh oh!

Uh oh!

github-actions bot commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

additional optimizations #619

additional optimizations #619

Conversation

quantizor commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quantizor commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcastil commented Nov 2, 2025

Uh oh!

quantizor commented Nov 2, 2025

Uh oh!

codspeed-hq bot commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #619 will not alter performance

Summary

Benchmarks breakdown

Uh oh!

dcastil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quantizor commented Nov 6, 2025

Uh oh!

dcastil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quantizor commented Oct 29, 2025 •

edited

Loading

quantizor commented Oct 29, 2025 •

edited

Loading

codspeed-hq bot commented Nov 2, 2025 •

edited

Loading