Skip to content

Optimize decoding utf8 buffers to strings. #2062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

adamfaulkner
Copy link

@adamfaulkner adamfaulkner commented Apr 17, 2025

Replace the existing utf8_read function with a higher performance approach, which switches between 3 different implementations depending on the contents of the string:

  • If the string is long and TextDecoder is available, use TextDecoder.
  • If the string is short and ASCII only, use String.fromCharCode, reading 8 bytes at a time.
  • Otherwise, use the old implementation, which works everywhere and can handle non-ascii inputs.

Here are the results of bench/index.js on my machine. Note that I tweaked bench/index.js to use Uint8Array rather than Node's Buffer, since Buffer is not available in browsers.

Before:

adamf@fedora ~/o/protobuf.js (master)> node bench/index.js
benchmarking decoding performance ...

protobuf.js (reflect) x 2,293,860 ops/sec ±0.21% (95 runs sampled)
protobuf.js (static) x 2,666,338 ops/sec ±0.48% (97 runs sampled)
JSON (string) x 1,025,987 ops/sec ±0.66% (96 runs sampled)
JSON (buffer) x 842,511 ops/sec ±0.73% (94 runs sampled)
google-protobuf x 656,314 ops/sec ±1.03% (91 runs sampled)

   protobuf.js (static) was fastest
  protobuf.js (reflect) was 13.7% ops/sec slower (factor 1.2)
          JSON (string) was 61.6% ops/sec slower (factor 2.6)
          JSON (buffer) was 68.5% ops/sec slower (factor 3.2)
        google-protobuf was 75.5% ops/sec slower (factor 4.1)

After optimizations applied:

benchmarking decoding performance ...

protobuf.js (reflect) x 2,740,020 ops/sec ±0.43% (93 runs sampled)
protobuf.js (static) x 2,896,863 ops/sec ±0.47% (98 runs sampled)
JSON (string) x 1,077,209 ops/sec ±0.29% (99 runs sampled)
JSON (buffer) x 932,395 ops/sec ±0.33% (97 runs sampled)
google-protobuf x 710,377 ops/sec ±0.35% (99 runs sampled)

   protobuf.js (static) was fastest
  protobuf.js (reflect) was 5.4% ops/sec slower (factor 1.1)
          JSON (string) was 62.7% ops/sec slower (factor 2.7)
          JSON (buffer) was 67.8% ops/sec slower (factor 3.1)
        google-protobuf was 75.4% ops/sec slower (factor 4.1)

The improvement is larger for protobufs that use more strings than the benchmark.

Also, adds a benchmark of these approaches to help convince us that this is an effective strategy.

Output of the benchmark program
benchmarking ascii decoding - very small strings (7 bytes) performance ...

Fallback implementation x 26,555,965 ops/sec ±1.26% (94 runs sampled)
Ascii optimized implementation x 24,544,739 ops/sec ±2.62% (85 runs sampled)
Optimized implementation x 26,341,753 ops/sec ±1.35% (91 runs sampled)
Node Buffer.toString x 9,217,721 ops/sec ±0.84% (93 runs sampled)
TextDecoder x 17,698,933 ops/sec ±0.81% (92 runs sampled)

Fallback implementation was fastest
Optimized implementation was 0.9% ops/sec slower (factor 1.0)
Ascii optimized implementation was 8.8% ops/sec slower (factor 1.1)
            TextDecoder was 33.1% ops/sec slower (factor 1.5)
   Node Buffer.toString was 65.1% ops/sec slower (factor 2.9)

benchmarking nonAscii decoding - very small strings (7 bytes) performance ...

Fallback implementation x 32,865,893 ops/sec ±1.74% (90 runs sampled)
Ascii optimized implementation x 29,602,839 ops/sec ±1.92% (91 runs sampled)
Optimized implementation x 28,114,949 ops/sec ±1.41% (89 runs sampled)
Node Buffer.toString x 7,758,577 ops/sec ±0.66% (93 runs sampled)
TextDecoder x 14,199,077 ops/sec ±1.38% (95 runs sampled)

Fallback implementation was fastest
Ascii optimized implementation was 10.1% ops/sec slower (factor 1.1)
Optimized implementation was 14.2% ops/sec slower (factor 1.2)
            TextDecoder was 56.6% ops/sec slower (factor 2.3)
   Node Buffer.toString was 76.1% ops/sec slower (factor 4.2)

benchmarking ascii decoding - small strings (20 bytes) performance ...

Fallback implementation x 9,660,584 ops/sec ±1.90% (91 runs sampled)
Ascii optimized implementation x 19,055,980 ops/sec ±1.29% (91 runs sampled)
Optimized implementation x 18,972,682 ops/sec ±0.78% (93 runs sampled)
Node Buffer.toString x 9,053,070 ops/sec ±0.71% (94 runs sampled)
TextDecoder x 17,069,260 ops/sec ±1.38% (94 runs sampled)

Optimized implementation was fastest
Ascii optimized implementation was 0.1% ops/sec slower (factor 1.0)
            TextDecoder was 10.6% ops/sec slower (factor 1.1)
Fallback implementation was 49.6% ops/sec slower (factor 2.0)
   Node Buffer.toString was 52.3% ops/sec slower (factor 2.1)

benchmarking nonAscii decoding - small strings (20 bytes) performance ...

Fallback implementation x 13,637,013 ops/sec ±0.48% (94 runs sampled)
Ascii optimized implementation x 11,266,322 ops/sec ±0.42% (97 runs sampled)
Optimized implementation x 11,912,698 ops/sec ±0.51% (97 runs sampled)
Node Buffer.toString x 6,265,259 ops/sec ±0.94% (95 runs sampled)
TextDecoder x 10,559,849 ops/sec ±0.45% (97 runs sampled)

Fallback implementation was fastest
Optimized implementation was 12.7% ops/sec slower (factor 1.1)
Ascii optimized implementation was 17.3% ops/sec slower (factor 1.2)
            TextDecoder was 22.5% ops/sec slower (factor 1.3)
   Node Buffer.toString was 54.3% ops/sec slower (factor 2.2)

benchmarking ascii decoding - medium strings (100 bytes) performance ...

Fallback implementation x 2,349,676 ops/sec ±1.88% (90 runs sampled)
Ascii optimized implementation x 5,222,330 ops/sec ±0.70% (93 runs sampled)
Optimized implementation x 15,518,384 ops/sec ±0.99% (95 runs sampled)
Node Buffer.toString x 7,921,649 ops/sec ±1.31% (90 runs sampled)
TextDecoder x 15,861,131 ops/sec ±0.78% (93 runs sampled)

            TextDecoder was fastest
Optimized implementation was 2.4% ops/sec slower (factor 1.0)
   Node Buffer.toString was 50.3% ops/sec slower (factor 2.0)
Ascii optimized implementation was 67.1% ops/sec slower (factor 3.0)
Fallback implementation was 85.3% ops/sec slower (factor 6.8)

benchmarking nonAscii decoding - medium strings (100 bytes) performance ...

Fallback implementation x 3,921,465 ops/sec ±1.76% (88 runs sampled)
Ascii optimized implementation x 3,273,387 ops/sec ±1.69% (91 runs sampled)
Optimized implementation x 3,885,206 ops/sec ±0.37% (97 runs sampled)
Node Buffer.toString x 3,050,990 ops/sec ±0.53% (98 runs sampled)
TextDecoder x 3,897,128 ops/sec ±0.36% (93 runs sampled)

            TextDecoder was fastest
Optimized implementation was 0.3% ops/sec slower (factor 1.0)
Fallback implementation was 0.8% ops/sec slower (factor 1.0)
Ascii optimized implementation was 17.1% ops/sec slower (factor 1.2)
   Node Buffer.toString was 21.8% ops/sec slower (factor 1.3)

benchmarking ascii decoding - large strings (1000 bytes) performance ...

Fallback implementation x 255,500 ops/sec ±0.80% (94 runs sampled)
Ascii optimized implementation x 561,257 ops/sec ±1.10% (91 runs sampled)
Optimized implementation x 7,401,210 ops/sec ±5.17% (81 runs sampled)
Node Buffer.toString x 2,365,643 ops/sec ±2.59% (85 runs sampled)
TextDecoder x 8,576,388 ops/sec ±0.98% (92 runs sampled)

            TextDecoder was fastest
Optimized implementation was 17.1% ops/sec slower (factor 1.2)
   Node Buffer.toString was 72.8% ops/sec slower (factor 3.7)
Ascii optimized implementation was 93.5% ops/sec slower (factor 15.3)
Fallback implementation was 97.0% ops/sec slower (factor 33.5)

benchmarking nonAscii decoding - large strings (1000 bytes) performance ...

Fallback implementation x 403,076 ops/sec ±0.43% (93 runs sampled)
Ascii optimized implementation x 375,654 ops/sec ±0.39% (99 runs sampled)
Optimized implementation x 450,154 ops/sec ±0.32% (99 runs sampled)
Node Buffer.toString x 398,743 ops/sec ±0.66% (96 runs sampled)
TextDecoder x 454,739 ops/sec ±0.16% (101 runs sampled)

            TextDecoder was fastest
Optimized implementation was 1.2% ops/sec slower (factor 1.0)
Fallback implementation was 11.6% ops/sec slower (factor 1.1)
   Node Buffer.toString was 12.7% ops/sec slower (factor 1.1)
Ascii optimized implementation was 17.6% ops/sec slower (factor 1.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant