[perf] Make an ASCII check faster by enabling the compiler to leverage SIMD instructions #3469

MahdiBM · 2025-12-26T18:14:24Z

Motivation:

Faster code.

Modifications:

Turn the "isASCII" check into simple a simple bitwise operation in a loop, which enables the compiler to compile the code into SIMD instructions, and in-turn make the code faster.
I did a quick check to have some numbers before I propose this PR; looks like this is a ~400x improvement even for small strings of 5 bytes.

Result:

Faster code.

Lukasa · 2026-01-02T15:58:16Z

Thanks for this!

I think the idea here is great, but I'm not sure this is the right execution on it. We can plausibly merge it anyway, though I think we should get solid numbers on this specific change before we do so.

The reason I'm hesitant is that so far as I can see this does not trigger autovectorization in the compiler. Even passing -target-cpu raptorlake doesn't meaningfully change that. This is as opposed to doing the same function directly on UnsafeRawBufferPointer, where we do see autovec.

So I'd like to confirm:

Did you see a 400x improvement in this exact use-case, or something closely related? Can you share a benchmark that compares the two strategies?
Should we consider whether we need to be putting together a Swift package whose focus is high performance string operations? You've got two PRs open on NIO projects to do fairly similar things, I wonder if this suggests a broader need.
In this case, should we drop down to a fast path if contiguousStorageIfAvailable is available, and then just leave the existing code as the slow path?

MahdiBM · 2026-01-02T16:30:33Z

Did you see a 400x improvement in this exact use-case, or something closely related? Can you share a benchmark that compares the two strategies?

No. Good catch. I usually just operate on raw bytes so did not notice this doesn't work directly on a UTF8View.
Looking at the underlying code though it does make sense.

Should we consider whether we need to be putting together a Swift package whose focus is high performance string operations? You've got two PRs open on NIO projects to do fairly similar things, I wonder if this suggests a broader need.

Yes I guess Swift could have something similar to https://github.com/simdutf/simdutf.
Though I think if I were to do such a package, there would still be the problem that NIO cannot depend on a package that primarily uses Spans instead of the pointer types. Of course I could just not "Primarily use Spans" but I can't say I like the idea. It's only "meh" though so not impossible.

In this case, should we drop down to a fast path if contiguousStorageIfAvailable is available, and then just leave the existing code as the slow path?

Yes that's the solution although it does look a bit ugly.

Lukasa · 2026-01-02T17:35:09Z

Of course I could just not "Primarily use Spans" but I can't say I like the idea. It's only "meh" though so not impossible.

I'd be inclined to say that you'd offer two variations of each method, one based on raw pointers and one on spans. NIO will be able to rely entirely on spans in relatively short order: when Swift 6.4 ships, Swift 6.2 will become our floor and we can then rely entirely on Span-taking APIs.

In fact, if we wanted to really go down this road of thought exercise: should the thing we're discussing just be simdutf? If someone (nudge nudge) wanted to engage with the simdutf folks about whether they were interested in adopting the __counted_by and __noescape flags, then conceptually C++ interop gets the rest of the way to a useful Swift package that basically just makes simdutf available.

Unfortunately, simdutf doesn't offer the specific algorithm I was hoping to find (a vectorized case-insensitive ASCII comparison), so some custom work would still be necessary, which again raises the question of whether we should write something specifically for this use-case for us.

MahdiBM · 2026-01-03T08:54:21Z

NIO will be able to rely entirely on spans in relatively short order: when Swift 6.4 ships, Swift 6.2 will become our floor

Unfortunately that's not correct. The macOS requirements for Span will still be too high.

In fact, if we wanted to really go down this road of thought exercise: should the thing we're discussing just be simdutf?

That's also a good idea. There would still be the problem that NIO cannot easily depend on Span.
I also wonder if the maintainers will be fine with adding a Package.swift to their project. I haven't tried but I think that'll make it so we can just rely on the simdutf as if it's a normal swift pacakge, for the most part.

Unfortunately, simdutf doesn't offer the specific algorithm I was hoping to find (a vectorized case-insensitive ASCII comparison)

Perhaps they'll be fine with adding such a function, assuming it really does not exist. A case-insensitive ASCII check is pretty common.

EDIT:

Previously this post also contained this, which is fake news and should be disregarded:

I had not taken a look at the code on machines with amd architecture. Apparently the auto-vectorization only kicks in on arm machines for some reason (We'll still need to access some kind of trivial form of the string, such as with accessing the pointer of bytes directly).

MahdiBM · 2026-01-03T09:32:06Z

Apparently simdutf supports std::span:
https://github.com/simdutf/simdutf?tab=readme-ov-file#c20-and-stdspan-usage-in-simdutf

I think Swift takes that into account, so we might not need any changes for the Span support:
https://www.swift.org/documentation/cxx-interop/#working-with-c-references-and-view-types-in-swift

MahdiBM · 2026-01-04T07:43:20Z

@Lukasa See simdutf/simdutf#893

MahdiBM · 2026-01-04T10:58:01Z

Related discussion: https://forums.swift.org/t/making-simdutf-c-c-seamlessly-interoperate-with-swift/83940

Make an ASCII check faster by enabling the compiler to leverage SIMD

3a219f0

MahdiBM mentioned this pull request Jan 4, 2026

Make simdutf trivial to use in Swift, through Swift's C++ interoperability simdutf/simdutf#893

Draft

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf] Make an ASCII check faster by enabling the compiler to leverage SIMD instructions #3469

[perf] Make an ASCII check faster by enabling the compiler to leverage SIMD instructions #3469

Uh oh!

MahdiBM commented Dec 26, 2025 •

edited

Loading

Uh oh!

Lukasa commented Jan 2, 2026

Uh oh!

MahdiBM commented Jan 2, 2026

Uh oh!

Lukasa commented Jan 2, 2026

Uh oh!

MahdiBM commented Jan 3, 2026 •

edited

Loading

Uh oh!

MahdiBM commented Jan 3, 2026

Uh oh!

MahdiBM commented Jan 4, 2026

Uh oh!

MahdiBM commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[perf] Make an ASCII check faster by enabling the compiler to leverage SIMD instructions #3469

Are you sure you want to change the base?

[perf] Make an ASCII check faster by enabling the compiler to leverage SIMD instructions #3469

Uh oh!

Conversation

MahdiBM commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation:

Modifications:

Result:

Uh oh!

Lukasa commented Jan 2, 2026

Uh oh!

MahdiBM commented Jan 2, 2026

Uh oh!

Lukasa commented Jan 2, 2026

Uh oh!

MahdiBM commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MahdiBM commented Jan 3, 2026

Uh oh!

MahdiBM commented Jan 4, 2026

Uh oh!

MahdiBM commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MahdiBM commented Dec 26, 2025 •

edited

Loading

MahdiBM commented Jan 3, 2026 •

edited

Loading