Skip to content

Conversation

@quantizor
Copy link
Contributor

@quantizor quantizor commented Oct 29, 2025

This PR builds on top of #547 with a few additional commits. Each change was verified with best-of-3 benchmark runs and rationale is in the commit body.

Results (all best-of-3)

Main branch baseline

Memory Usage Summary:
  init: 1.57 MB heap
  simple: 1016.39 KB heap
  heavy: 1.01 MB heap
  collection with cache: 2.52 MB heap
    Total footprint: 234.75 MB
    Operations: 1322
  collection without cache: 16.99 MB heap
    Total footprint: 235.13 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3136ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,065.35  0.2175   1.8490  0.2460  0.2450   0.4969   0.5517   0.8136  ±1.12%     2033
   · simple                    4,062.11  0.2270   0.7383  0.2462  0.2448   0.4628   0.4756   0.5714  ±0.70%     2032
   · heavy                     3,837.87  0.2374   2.3023  0.2606  0.2597   0.4791   0.5047   0.8659  ±1.06%     1919
   · collection with cache       688.97  1.3773   2.2432  1.4514  1.4654   1.7979   1.9276   2.2432  ±0.68%      345
   · collection without cache    109.09  8.8340  10.2709  9.1664  9.2940  10.2709  10.2709  10.2709  ±0.73%       55

Perf branch

Memory Usage Summary:
  init: 1.22 MB heap
  simple: 838.82 KB heap
  heavy: 647.81 KB heap
  collection with cache: 1.79 MB heap
    Total footprint: 171.70 MB
    Operations: 1322
  collection without cache: 13.72 MB heap
    Total footprint: 238.06 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3130ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,581.74  0.1975   1.3568  0.2183  0.2220   0.3657   0.4523   0.6930  ±0.89%     2291
   · simple                    4,396.86  0.2133   0.5638  0.2274  0.2274   0.4307   0.4576   0.5137  ±0.55%     2199
   · heavy                     4,182.74  0.2232   0.5723  0.2391  0.2393   0.4502   0.4675   0.5242  ±0.58%     2092
   · collection with cache       724.98  1.3007   3.3782  1.3793  1.4092   1.6660   2.2602   3.3782  ±1.02%      363
   · collection without cache    110.72  8.6614  10.1187  9.0318  9.2184  10.1187  10.1187  10.1187  ±1.10%       56

Further optimizations branch

Memory Usage Summary:
  init: 1.29 MB heap
  simple: 684.23 KB heap
  heavy: 651.05 KB heap
  collection with cache: 1.87 MB heap
    Total footprint: 177.91 MB
    Operations: 1322
  collection without cache: 11.69 MB heap
    Total footprint: 237.63 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3117ms
     name                            hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · init                      4,676.21  0.1966  0.9308  0.2138  0.2122  0.4457  0.4768  0.6357  ±0.76%     2339
   · simple                    4,382.74  0.2119  2.8712  0.2282  0.2259  0.4020  0.4154  0.4648  ±1.17%     2192
   · heavy                     4,278.80  0.2224  0.5003  0.2337  0.2330  0.4149  0.4403  0.4645  ±0.50%     2140
   · collection with cache       747.74  1.2559  4.3190  1.3374  1.3366  1.6999  2.1674  4.3190  ±1.34%      374
   · collection without cache    121.55  7.9514  9.3715  8.2273  8.2853  9.3715  9.3715  9.3715  ±0.94%       61

@github-actions github-actions bot added the context-v3 Related to tailwind-merge v3 label Oct 29, 2025
@quantizor
Copy link
Contributor Author

quantizor commented Oct 29, 2025

Best of 3 with memory info from #620

main branch

Memory Usage Summary:
  init: 1.57 MB heap
  simple: 1016.39 KB heap
  heavy: 1.01 MB heap
  collection with cache: 2.52 MB heap
    Total footprint: 234.75 MB
    Operations: 1322
  collection without cache: 16.99 MB heap
    Total footprint: 235.13 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3136ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,065.35  0.2175   1.8490  0.2460  0.2450   0.4969   0.5517   0.8136  ±1.12%     2033
   · simple                    4,062.11  0.2270   0.7383  0.2462  0.2448   0.4628   0.4756   0.5714  ±0.70%     2032
   · heavy                     3,837.87  0.2374   2.3023  0.2606  0.2597   0.4791   0.5047   0.8659  ±1.06%     1919
   · collection with cache       688.97  1.3773   2.2432  1.4514  1.4654   1.7979   1.9276   2.2432  ±0.68%      345
   · collection without cache    109.09  8.8340  10.2709  9.1664  9.2940  10.2709  10.2709  10.2709  ±0.73%       55

perf branch

Memory Usage Summary:
  init: 1.22 MB heap
  simple: 838.82 KB heap
  heavy: 647.81 KB heap
  collection with cache: 1.79 MB heap
    Total footprint: 171.70 MB
    Operations: 1322
  collection without cache: 13.72 MB heap
    Total footprint: 238.06 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3130ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,581.74  0.1975   1.3568  0.2183  0.2220   0.3657   0.4523   0.6930  ±0.89%     2291
   · simple                    4,396.86  0.2133   0.5638  0.2274  0.2274   0.4307   0.4576   0.5137  ±0.55%     2199
   · heavy                     4,182.74  0.2232   0.5723  0.2391  0.2393   0.4502   0.4675   0.5242  ±0.58%     2092
   · collection with cache       724.98  1.3007   3.3782  1.3793  1.4092   1.6660   2.2602   3.3782  ±1.02%      363
   · collection without cache    110.72  8.6614  10.1187  9.0318  9.2184  10.1187  10.1187  10.1187  ±1.10%       56

further-improvements branch

Memory Usage Summary:
  init: 1.24 MB heap
  simple: 786.39 KB heap
  heavy: 637.91 KB heap
  collection with cache: 1.59 MB heap
    Total footprint: 200.55 MB
    Operations: 1322
  collection without cache: 10.19 MB heap
    Total footprint: 201.14 MB
    Operations: 1322


 ✓ tests/tw-merge.benchmark.ts > twMerge 3058ms
     name                            hz     min      max    mean     p75      p99     p995     p999     rme  samples
   · init                      4,339.17  0.2006   5.1748  0.2305  0.2191   0.6207   0.6480   0.7138  ±2.32%     2171
   · simple                    4,653.32  0.2008   1.2895  0.2149  0.2137   0.3652   0.3806   0.4122  ±0.68%     2327
   · heavy                     4,482.96  0.2108   0.5238  0.2231  0.2236   0.3895   0.4034   0.4409  ±0.49%     2242
   · collection with cache       771.76  1.2577   1.5958  1.2957  1.3007   1.5194   1.5574   1.5958  ±0.41%      386
   · collection without cache    118.02  8.0855  10.1479  8.4732  8.5805  10.1479  10.1479  10.1479  ±1.33%       60

This branch brings the overall performance improvement over main to:

  • simple: 14.5543% faster
  • heavy: 16.8085% faster
  • collection with cache: 12.0165% faster
  • collection without cache: 8.1859% faster

Not bad

@dcastil dcastil added the feature Is new feature label Nov 2, 2025
@dcastil
Copy link
Owner

dcastil commented Nov 2, 2025

Can you rebase the branch on top of the newest main in tailwind-merge? For some reason the diff shows all the changes from #547 which makes it difficult to understand which changes are unique to this PR. Alternatively you can cherry-pick the new commits onto a new branch to fix it.

Most Tailwind CSS classes have zero or one modifier (e.g., 'p-4' or 'hover:bg-red-500'),
making this a very common case. The previous implementation always called sortModifiers()
and join(), even when no sorting or joining was needed.

This optimization:
- Avoids function call overhead for modifiers.length === 0
- Avoids array allocation and join overhead for modifiers.length === 1
- Reduces work for the most common path through the code

Benchmark results show ~4.2% improvement on 'collection without cache' benchmark
(from 105.39 hz to 109.86 hz).
Refactor getGroupRecursive to use index-based traversal instead of array slicing,
and eliminate array mutation from shift().

Previous implementation issues:
- classParts.slice(1) created a new array on every recursive call, causing
  O(n) allocations for deep class name lookups
- classParts.shift() mutated the array and moved all elements, causing O(n)
  element movement and potential V8 deoptimization

Optimizations:
- Added startIndex parameter to getGroupRecursive() to track position without
  array slicing
- Replaced shift() with index offset calculation, eliminating array mutation
- Only slice array when building classRest string for validators (less frequent
  path), and optimize with early check for startIndex === 0

This maintains monomorphic call sites (important for V8 optimization) while
significantly reducing memory allocations during class group lookups.

Benchmark results show ~1.6% improvement on 'collection without cache' benchmark
(when combined with fast path optimization).
Replace localeCompare() calls with direct string comparison for alphabetical
sorting of modifiers.

Motivation:
- localeCompare() performs locale-aware comparison, which involves:
  - Locale processing overhead
  - More complex string comparison logic
  - Potential locale string allocations
- Tailwind CSS modifiers are ASCII identifiers (e.g., 'hover', 'focus', 'dark'),
  making locale-aware comparison unnecessary
- Direct comparison (a < b ? -1 : a > b ? 1 : 0) is simpler and faster,
  leveraging V8's optimized string comparison primitives

This change affects modifier sorting when multiple modifiers need to be sorted
alphabetically. For modifier arrays with 2+ elements that require sorting,
this provides measurable performance improvement.

Benchmark results show ~0.4% improvement on 'collection without cache' benchmark
(when combined with previous optimizations).
Pre-compute conflict arrays in Maps at initialization time instead of
concatenating arrays at runtime on every call to getConflictingClassGroupIds.

Architectural improvement:
- Build conflictsWithoutPostfix Map for all class groups with conflicts
- Build conflictsWithPostfix Map with pre-merged arrays for classes
  that have both base conflicts and modifier conflicts
- Eliminates runtime concatArrays() calls, replacing with O(1) Map lookups

This moves work from the hot path (called for every Tailwind class)
to initialization time (called once). The concatArrays operation was
creating new arrays and copying elements on every conflict check.

Benchmark results show ~2.6% improvement on 'collection without cache'
(from 114.02 hz to 116.97 hz).
@quantizor quantizor force-pushed the further-improvements branch from 246804c to 7831c8e Compare November 2, 2025 14:15
@quantizor
Copy link
Contributor Author

@dcastil all set!

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 2, 2025

CodSpeed Performance Report

Merging #619 will not alter performance

Comparing quantizor:further-improvements (1bafc9c) with main (57372fa)

Summary

✅ 5 untouched
🆕 2 new

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 ultra long class list with many conflicts with cache N/A 16.1 ms N/A
🆕 ultra long class list with many conflicts without cache N/A 16.1 ms N/A

Copy link
Owner

@dcastil dcastil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these look solid! Also many thanks for the detailed commit descriptions, they were quite helpful.

I think we can remove the conflictsWithoutPostfix and the argument to .sort(), but otherwise it all looks good!

Remove conflictsWithoutPostfix and conflictsWithPostfix maps and compute conflicts on-the-fly directly from config objects instead.
Add benchmark for ultra long class lists with many conflicts to demonstrate performance characteristics with large class sets.
@quantizor
Copy link
Contributor Author

@dcastil all set!

The optimization provides no benefit since the function is only called with >1 strings, making the array-based approach unnecessary overhead.
@quantizor quantizor force-pushed the further-improvements branch from 62a3175 to 0799c12 Compare November 6, 2025 21:47
Copy link
Owner

@dcastil dcastil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, many thanks again! 🚀

@dcastil dcastil merged commit 75e9aef into dcastil:main Nov 9, 2025
5 checks passed
@github-actions
Copy link

github-actions bot commented Nov 9, 2025

This was addressed in release v3.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

context-v3 Related to tailwind-merge v3 feature Is new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants