WS: Hardware-accelerated WebSocket masking via SIMD by Sh3llcod3 · Pull Request #251 · lexiforest/curl-impersonate

Sh3llcod3 · 2026-04-23T21:10:10Z

Overview

@lexiforest When I took cProfile captures of WebSocket send benchmarks, I found a severe CPU bottleneck in libcurl's code. The ws_enc_write_payload applies RFC 6455 XOR masking byte-by-byte and invokes Curl_bufq_write, capping transmit speeds:

Changes

SIMD Hardware Acceleration: To fix this, I added AVX-512, AVX2, and ARM NEON vectorized XOR masking. Fallback scalar paths are optimized and gated safely behind macros for runtime dispatch, maintaining static-binary portability. I've aligned the XOR buffer (xbuf) to 64-byte cache lines to reduce memory latency penalties.
Increased Buffer Size: Increased WS_CHUNK_SIZE from 64KB to 128KB to reduce recv()/send() syscall overhead.
State Machine Hardening:
- Fixed an issue in ws_flush where partial-write progress was dropped if the socket returned CURLE_AGAIN on the same cycle, preventing frame corruption.
- Fixed ws_send_raw_blocking so that it accurately reports partial bytes sent to the callback layer if a connection dies mid-stream.
- Improved End-of-Stream validation in ws_cw_write to ensure the decoder is cleanly reset before accepting stream termination.

This PR accompanies the lexiforest/curl_cffi#749 PR.

Impact

Once the patch is applied WebSocket transmit speeds are massively improved.

They are no longer CPU-bound and huge speed improvements are visible (more than 10x on my server). The send side now provides multi-gigabit throughput:

In fact, the bottleneck is now AIOHTTP!

When acting as the receiving server, AIOHTTP is CPU pinned at 100%.

AVX-512 can result in downclocking in some CPUs, but the clock drop should still leave us with good performance. The best part is that the patches are done in a highly portable way, so if your CPU does not support AVX-512/AVX2/NEON SIMD instructions, it will fall back to a fast scalar loop that's still much faster than the original code. CPU feature detection is done via runtime dispatch so the library can seamlessly statically link and run like normal.

No additional compiler flags or build step changes are needed.

This reverts commit 394e516.

Sh3llcod3 · 2026-04-25T14:02:37Z

@lexiforest Almost there with the pipeline, I think we are getting close.

Sh3llcod3 · 2026-04-25T14:12:34Z

@lexiforest Haha - I think that's fixed the issue, though the pipeline needs another re-run:

curl: (28) Failed to connect to ftp.gnu.org port 443 after 136195 ms: Couldn't connect to server
make: *** [Makefile:422: libidn2-2.3.7.tar.gz] Error 28
Error: Process completed with exit code 2.

You know its an interesting day when GNU is down...

Speeds are good, no regressions:

lexiforest · 2026-04-25T14:17:55Z

GNU libunistring is so unique(annoying actaully), it also stands on my way of migrating to cmake. I was considering prebuilding a binary in another repository.

- The real goal is to re-run the pipeline

Sh3llcod3 · 2026-04-25T14:37:38Z

Yeah - I think it looks good from my end, it would be good if you test as well, in case something crops up. I'll bring out the old Raspberry Pi 4 and make sure the NEON SIMD path works as well (Apple M-series chip can probably test this better).

lexiforest · 2026-04-27T10:33:39Z

Thanks, I will review it in the next couple of days.

… into ws-send-patch

Sh3llcod3 · 2026-04-30T20:27:00Z

I've improved the SIMD gate to support as many CPUs and OSes as I could.

Sh3llcod3 added 3 commits April 23, 2026 21:30

Add in SIMD patch

e967342

Add gate for older clang versions

686e84a

Lint patch with make checksrc

2bc44ec

Sh3llcod3 mentioned this pull request Apr 24, 2026

WS: Improve Send via SIMD Hardware Acceleration lexiforest/curl_cffi#749

Open

1 task

Sh3llcod3 added 3 commits April 25, 2026 13:00

Attempt to fix CPUID detection for CI/CD

394e516

Revert "Attempt to fix CPUID detection for CI/CD"

1b424e4

This reverts commit 394e516.

Improve CPU feature detection for CI/CD and move xbuf to heap

780f11c

Update license year

1e600fc

- The real goal is to re-run the pipeline

Sh3llcod3 added 4 commits April 30, 2026 20:38

Update patch with more robust SIMD gates and CPU checks

dbb74c9

Merge branch 'main' into ws-send-patch

a3b3f63

Lint patch with checksrc

9f53836

Merge branch 'ws-send-patch' of github.com:Sh3llcod3/curl-impersonate…

f5d6045

… into ws-send-patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WS: Hardware-accelerated WebSocket masking via SIMD#251

WS: Hardware-accelerated WebSocket masking via SIMD#251
Sh3llcod3 wants to merge 11 commits intolexiforest:mainfrom
Sh3llcod3:ws-send-patch

Sh3llcod3 commented Apr 23, 2026 •

edited

Loading

Uh oh!

Sh3llcod3 commented Apr 25, 2026

Uh oh!

Sh3llcod3 commented Apr 25, 2026 •

edited

Loading

Uh oh!

lexiforest commented Apr 25, 2026 •

edited

Loading

Uh oh!

Sh3llcod3 commented Apr 25, 2026 •

edited

Loading

Uh oh!

lexiforest commented Apr 27, 2026

Uh oh!

Sh3llcod3 commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sh3llcod3 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Impact

Uh oh!

Sh3llcod3 commented Apr 25, 2026

Uh oh!

Sh3llcod3 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lexiforest commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sh3llcod3 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lexiforest commented Apr 27, 2026

Uh oh!

Sh3llcod3 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sh3llcod3 commented Apr 23, 2026 •

edited

Loading

Sh3llcod3 commented Apr 25, 2026 •

edited

Loading

lexiforest commented Apr 25, 2026 •

edited

Loading

Sh3llcod3 commented Apr 25, 2026 •

edited

Loading

Sh3llcod3 commented Apr 30, 2026 •

edited

Loading