Optimize decoding utf8 buffers to strings. #2062

adamfaulkner · 2025-04-17T23:07:36Z

Replace the existing utf8_read function with a higher performance approach, which switches between 3 different implementations depending on the contents of the string:

If the string is long and TextDecoder is available, use TextDecoder.
If the string is short and ASCII only, use String.fromCharCode, reading 8 bytes at a time.
Otherwise, use the old implementation, which works everywhere and can handle non-ascii inputs.

Here are the results of bench/index.js on my machine. Note that I tweaked bench/index.js to use Uint8Array rather than Node's Buffer, since Buffer is not available in browsers.

Before:

adamf@fedora ~/o/protobuf.js (master)> node bench/index.js
benchmarking decoding performance ...

protobuf.js (reflect) x 2,293,860 ops/sec ±0.21% (95 runs sampled)
protobuf.js (static) x 2,666,338 ops/sec ±0.48% (97 runs sampled)
JSON (string) x 1,025,987 ops/sec ±0.66% (96 runs sampled)
JSON (buffer) x 842,511 ops/sec ±0.73% (94 runs sampled)
google-protobuf x 656,314 ops/sec ±1.03% (91 runs sampled)

   protobuf.js (static) was fastest
  protobuf.js (reflect) was 13.7% ops/sec slower (factor 1.2)
          JSON (string) was 61.6% ops/sec slower (factor 2.6)
          JSON (buffer) was 68.5% ops/sec slower (factor 3.2)
        google-protobuf was 75.5% ops/sec slower (factor 4.1)

After optimizations applied:

benchmarking decoding performance ...

protobuf.js (reflect) x 2,740,020 ops/sec ±0.43% (93 runs sampled)
protobuf.js (static) x 2,896,863 ops/sec ±0.47% (98 runs sampled)
JSON (string) x 1,077,209 ops/sec ±0.29% (99 runs sampled)
JSON (buffer) x 932,395 ops/sec ±0.33% (97 runs sampled)
google-protobuf x 710,377 ops/sec ±0.35% (99 runs sampled)

   protobuf.js (static) was fastest
  protobuf.js (reflect) was 5.4% ops/sec slower (factor 1.1)
          JSON (string) was 62.7% ops/sec slower (factor 2.7)
          JSON (buffer) was 67.8% ops/sec slower (factor 3.1)
        google-protobuf was 75.4% ops/sec slower (factor 4.1)

The improvement is larger for protobufs that use more strings than the benchmark.

Also, adds a benchmark of these approaches to help convince us that this is an effective strategy.

Output of the benchmark program

benchmarking ascii decoding - very small strings (7 bytes) performance ...

Fallback implementation x 26,555,965 ops/sec ±1.26% (94 runs sampled)
Ascii optimized implementation x 24,544,739 ops/sec ±2.62% (85 runs sampled)
Optimized implementation x 26,341,753 ops/sec ±1.35% (91 runs sampled)
Node Buffer.toString x 9,217,721 ops/sec ±0.84% (93 runs sampled)
TextDecoder x 17,698,933 ops/sec ±0.81% (92 runs sampled)

Fallback implementation was fastest
Optimized implementation was 0.9% ops/sec slower (factor 1.0)
Ascii optimized implementation was 8.8% ops/sec slower (factor 1.1)
            TextDecoder was 33.1% ops/sec slower (factor 1.5)
   Node Buffer.toString was 65.1% ops/sec slower (factor 2.9)

benchmarking nonAscii decoding - very small strings (7 bytes) performance ...

Fallback implementation x 32,865,893 ops/sec ±1.74% (90 runs sampled)
Ascii optimized implementation x 29,602,839 ops/sec ±1.92% (91 runs sampled)
Optimized implementation x 28,114,949 ops/sec ±1.41% (89 runs sampled)
Node Buffer.toString x 7,758,577 ops/sec ±0.66% (93 runs sampled)
TextDecoder x 14,199,077 ops/sec ±1.38% (95 runs sampled)

Fallback implementation was fastest
Ascii optimized implementation was 10.1% ops/sec slower (factor 1.1)
Optimized implementation was 14.2% ops/sec slower (factor 1.2)
            TextDecoder was 56.6% ops/sec slower (factor 2.3)
   Node Buffer.toString was 76.1% ops/sec slower (factor 4.2)

benchmarking ascii decoding - small strings (20 bytes) performance ...

Fallback implementation x 9,660,584 ops/sec ±1.90% (91 runs sampled)
Ascii optimized implementation x 19,055,980 ops/sec ±1.29% (91 runs sampled)
Optimized implementation x 18,972,682 ops/sec ±0.78% (93 runs sampled)
Node Buffer.toString x 9,053,070 ops/sec ±0.71% (94 runs sampled)
TextDecoder x 17,069,260 ops/sec ±1.38% (94 runs sampled)

Optimized implementation was fastest
Ascii optimized implementation was 0.1% ops/sec slower (factor 1.0)
            TextDecoder was 10.6% ops/sec slower (factor 1.1)
Fallback implementation was 49.6% ops/sec slower (factor 2.0)
   Node Buffer.toString was 52.3% ops/sec slower (factor 2.1)

benchmarking nonAscii decoding - small strings (20 bytes) performance ...

Fallback implementation x 13,637,013 ops/sec ±0.48% (94 runs sampled)
Ascii optimized implementation x 11,266,322 ops/sec ±0.42% (97 runs sampled)
Optimized implementation x 11,912,698 ops/sec ±0.51% (97 runs sampled)
Node Buffer.toString x 6,265,259 ops/sec ±0.94% (95 runs sampled)
TextDecoder x 10,559,849 ops/sec ±0.45% (97 runs sampled)

Fallback implementation was fastest
Optimized implementation was 12.7% ops/sec slower (factor 1.1)
Ascii optimized implementation was 17.3% ops/sec slower (factor 1.2)
            TextDecoder was 22.5% ops/sec slower (factor 1.3)
   Node Buffer.toString was 54.3% ops/sec slower (factor 2.2)

benchmarking ascii decoding - medium strings (100 bytes) performance ...

Fallback implementation x 2,349,676 ops/sec ±1.88% (90 runs sampled)
Ascii optimized implementation x 5,222,330 ops/sec ±0.70% (93 runs sampled)
Optimized implementation x 15,518,384 ops/sec ±0.99% (95 runs sampled)
Node Buffer.toString x 7,921,649 ops/sec ±1.31% (90 runs sampled)
TextDecoder x 15,861,131 ops/sec ±0.78% (93 runs sampled)

            TextDecoder was fastest
Optimized implementation was 2.4% ops/sec slower (factor 1.0)
   Node Buffer.toString was 50.3% ops/sec slower (factor 2.0)
Ascii optimized implementation was 67.1% ops/sec slower (factor 3.0)
Fallback implementation was 85.3% ops/sec slower (factor 6.8)

benchmarking nonAscii decoding - medium strings (100 bytes) performance ...

Fallback implementation x 3,921,465 ops/sec ±1.76% (88 runs sampled)
Ascii optimized implementation x 3,273,387 ops/sec ±1.69% (91 runs sampled)
Optimized implementation x 3,885,206 ops/sec ±0.37% (97 runs sampled)
Node Buffer.toString x 3,050,990 ops/sec ±0.53% (98 runs sampled)
TextDecoder x 3,897,128 ops/sec ±0.36% (93 runs sampled)

            TextDecoder was fastest
Optimized implementation was 0.3% ops/sec slower (factor 1.0)
Fallback implementation was 0.8% ops/sec slower (factor 1.0)
Ascii optimized implementation was 17.1% ops/sec slower (factor 1.2)
   Node Buffer.toString was 21.8% ops/sec slower (factor 1.3)

benchmarking ascii decoding - large strings (1000 bytes) performance ...

Fallback implementation x 255,500 ops/sec ±0.80% (94 runs sampled)
Ascii optimized implementation x 561,257 ops/sec ±1.10% (91 runs sampled)
Optimized implementation x 7,401,210 ops/sec ±5.17% (81 runs sampled)
Node Buffer.toString x 2,365,643 ops/sec ±2.59% (85 runs sampled)
TextDecoder x 8,576,388 ops/sec ±0.98% (92 runs sampled)

            TextDecoder was fastest
Optimized implementation was 17.1% ops/sec slower (factor 1.2)
   Node Buffer.toString was 72.8% ops/sec slower (factor 3.7)
Ascii optimized implementation was 93.5% ops/sec slower (factor 15.3)
Fallback implementation was 97.0% ops/sec slower (factor 33.5)

benchmarking nonAscii decoding - large strings (1000 bytes) performance ...

Fallback implementation x 403,076 ops/sec ±0.43% (93 runs sampled)
Ascii optimized implementation x 375,654 ops/sec ±0.39% (99 runs sampled)
Optimized implementation x 450,154 ops/sec ±0.32% (99 runs sampled)
Node Buffer.toString x 398,743 ops/sec ±0.66% (96 runs sampled)
TextDecoder x 454,739 ops/sec ±0.16% (101 runs sampled)

            TextDecoder was fastest
Optimized implementation was 1.2% ops/sec slower (factor 1.0)
Fallback implementation was 11.6% ops/sec slower (factor 1.1)
   Node Buffer.toString was 12.7% ops/sec slower (factor 1.1)
Ascii optimized implementation was 17.6% ops/sec slower (factor 1.2)

adamfaulkner added 3 commits April 17, 2025 16:01

Optimize decoding utf8 buffers to strings.

457ffe5

optimize slicing

d8c89b5

comments; 4 byte unroll instead

3af5e70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize decoding utf8 buffers to strings. #2062

Optimize decoding utf8 buffers to strings. #2062

Uh oh!

adamfaulkner commented Apr 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Optimize decoding utf8 buffers to strings. #2062

Are you sure you want to change the base?

Optimize decoding utf8 buffers to strings. #2062

Uh oh!

Conversation

adamfaulkner commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adamfaulkner commented Apr 17, 2025 •

edited

Loading