Skip to content

Conversation

@MahdiBM
Copy link
Contributor

@MahdiBM MahdiBM commented Dec 26, 2025

Motivation:

Faster code.

Modifications:

Turn the "isASCII" check into simple a simple bitwise operation in a loop, which enables the compiler to compile the code into SIMD instructions, and in-turn make the code faster.
I did a quick check to have some numbers before I propose this PR; looks like this is a ~400x improvement even for small strings of 5 bytes.

Result:

Faster code.

@Lukasa
Copy link
Contributor

Lukasa commented Jan 2, 2026

Thanks for this!

I think the idea here is great, but I'm not sure this is the right execution on it. We can plausibly merge it anyway, though I think we should get solid numbers on this specific change before we do so.

The reason I'm hesitant is that so far as I can see this does not trigger autovectorization in the compiler. Even passing -target-cpu raptorlake doesn't meaningfully change that. This is as opposed to doing the same function directly on UnsafeRawBufferPointer, where we do see autovec.

So I'd like to confirm:

  1. Did you see a 400x improvement in this exact use-case, or something closely related? Can you share a benchmark that compares the two strategies?
  2. Should we consider whether we need to be putting together a Swift package whose focus is high performance string operations? You've got two PRs open on NIO projects to do fairly similar things, I wonder if this suggests a broader need.
  3. In this case, should we drop down to a fast path if contiguousStorageIfAvailable is available, and then just leave the existing code as the slow path?

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jan 2, 2026

Did you see a 400x improvement in this exact use-case, or something closely related? Can you share a benchmark that compares the two strategies?

No. Good catch. I usually just operate on raw bytes so did not notice this doesn't work directly on a UTF8View.
Looking at the underlying code though it does make sense.

Should we consider whether we need to be putting together a Swift package whose focus is high performance string operations? You've got two PRs open on NIO projects to do fairly similar things, I wonder if this suggests a broader need.

Yes I guess Swift could have something similar to https://github.com/simdutf/simdutf.
Though I think if I were to do such a package, there would still be the problem that NIO cannot depend on a package that primarily uses Spans instead of the pointer types. Of course I could just not "Primarily use Spans" but I can't say I like the idea. It's only "meh" though so not impossible.

In this case, should we drop down to a fast path if contiguousStorageIfAvailable is available, and then just leave the existing code as the slow path?

Yes that's the solution although it does look a bit ugly.

@Lukasa
Copy link
Contributor

Lukasa commented Jan 2, 2026

Of course I could just not "Primarily use Spans" but I can't say I like the idea. It's only "meh" though so not impossible.

I'd be inclined to say that you'd offer two variations of each method, one based on raw pointers and one on spans. NIO will be able to rely entirely on spans in relatively short order: when Swift 6.4 ships, Swift 6.2 will become our floor and we can then rely entirely on Span-taking APIs.

In fact, if we wanted to really go down this road of thought exercise: should the thing we're discussing just be simdutf? If someone (nudge nudge) wanted to engage with the simdutf folks about whether they were interested in adopting the __counted_by and __noescape flags, then conceptually C++ interop gets the rest of the way to a useful Swift package that basically just makes simdutf available.

Unfortunately, simdutf doesn't offer the specific algorithm I was hoping to find (a vectorized case-insensitive ASCII comparison), so some custom work would still be necessary, which again raises the question of whether we should write something specifically for this use-case for us.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jan 3, 2026

NIO will be able to rely entirely on spans in relatively short order: when Swift 6.4 ships, Swift 6.2 will become our floor

Unfortunately that's not correct. The macOS requirements for Span will still be too high.

In fact, if we wanted to really go down this road of thought exercise: should the thing we're discussing just be simdutf?

That's also a good idea. There would still be the problem that NIO cannot easily depend on Span.
I also wonder if the maintainers will be fine with adding a Package.swift to their project. I haven't tried but I think that'll make it so we can just rely on the simdutf as if it's a normal swift pacakge, for the most part.

Unfortunately, simdutf doesn't offer the specific algorithm I was hoping to find (a vectorized case-insensitive ASCII comparison)

Perhaps they'll be fine with adding such a function, assuming it really does not exist. A case-insensitive ASCII check is pretty common.

EDIT:

Previously this post also contained this, which is fake news and should be disregarded:

I had not taken a look at the code on machines with amd architecture. Apparently the auto-vectorization only kicks in on arm machines for some reason (We'll still need to access some kind of trivial form of the string, such as with accessing the pointer of bytes directly).

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jan 3, 2026

Apparently simdutf supports std::span:
https://github.com/simdutf/simdutf?tab=readme-ov-file#c20-and-stdspan-usage-in-simdutf

I think Swift takes that into account, so we might not need any changes for the Span support:
https://www.swift.org/documentation/cxx-interop/#working-with-c-references-and-view-types-in-swift

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jan 4, 2026

@Lukasa See simdutf/simdutf#893

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jan 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants