Performance comparison between the new Stream API, the Web Streams API, and Node.js Streams.
# Run all benchmarks
npx tsx benchmarks/run-all.ts
# Run a specific benchmark
npx tsx benchmarks/01-throughput.ts
npx tsx benchmarks/02-push-streams.ts
npx tsx benchmarks/03-transforms.ts
npx tsx benchmarks/04-pipelines.ts
npx tsx benchmarks/06-consumption.tsThese benchmarks are designed for fair comparisons between the APIs:
- Equivalent operations: All three APIs perform the same work (iteration, counting, transforms)
- Same consumption pattern: All use async iteration where possible, not mixing
pipeTo()withfor await - No extra work: New Streams benchmarks don't use
bytes()collection when other APIs just count - Node.js Streams: Uses
Readable,Transform, andPassThroughfromnode:streamwith async iteration
Raw Throughput - Measures basic data flow speed through streams.
| Scenario | Description |
|---|---|
| Large chunks (64KB) | Fewer, larger transfers |
| Medium chunks (8KB) | Balanced chunk size |
| Small chunks (1KB) | Many small transfers |
| Tiny chunks (100B) | High per-chunk overhead scenario |
| Async iteration | Direct for await consumption |
| Generator source | Async generator as source |
Push Stream Performance - Tests producer/consumer patterns with backpressure.
| Scenario | Description |
|---|---|
| Concurrent push | Interleaved write/read with medium chunks |
| Many small writes | High-frequency small data (64B) |
| Batch writes (writev) | Multiple chunks per write call |
| Push + async iter | Push stream consumed via async iteration |
Transform Performance - Data transformation speed.
| Scenario | Description |
|---|---|
| Identity transform | Pass-through (baseline overhead) |
| XOR transform | Byte manipulation (CPU-bound) |
| Expanding transform | 1:2 output (generator-based) |
| Chained transforms | 3 transform stages |
| Async transform | Transforms returning promises |
Pipeline Performance - Full source → transform → destination flows.
| Scenario | Description |
|---|---|
| Simple pipeline | Direct source to iteration |
| Pipeline + XOR | Single CPU-bound transform |
| Multi-stage pipeline | 3 identity transform stages |
| High-frequency | Many small chunks (64B x 20000) |
Consumption Methods - Different ways to read stream data.
| Scenario | Description |
|---|---|
| bytes() | Collect and concatenate all data |
| text() | Collect and decode to string |
| Async iteration | for await...of loop |
| Iterator loop | Manual iterator.next() calls |
Results from Node.js on the reference implementation. The new API uses batched iteration (Uint8Array[]) which amortizes async overhead.
Note: These results now include three-way comparisons between New Streams, Web Streams, and Node.js Streams.
Scenario | New Stream | Web Stream | Node Stream | New vs Web | New vs Node
---------------------------------+----------------+----------------+----------------+--------------+-------------
Large chunks (64KB x 500) | ~5 GB/s | ~6 GB/s | ~5 GB/s | ~same | ~same
Medium chunks (8KB x 2000) | ~6 GB/s | ~5 GB/s | ~4 GB/s | 1.2x faster | 1.5x faster
Small chunks (1KB x 5000) | ~5 GB/s | ~2.5 GB/s | ~2 GB/s | 2x faster | 2.5x faster
Tiny chunks (100B x 10000) | ~1.6 GB/s | ~316 MB/s | ~400 MB/s | 5x faster | 4x faster
Async iteration (8KB x 1000) | ~270 GB/s | ~22 GB/s | ~18 GB/s | 12x faster | 15x faster
Generator source (8KB x 1000) | ~23 GB/s | ~16 GB/s | ~14 GB/s | 1.4x faster | 1.6x faster
Scenario | New Stream | Web Stream | Node Stream | New vs Web | New vs Node
---------------------------------+----------------+----------------+----------------+--------------+-------------
Concurrent push (4KB x 1000) | ~140 MB/s | ~140 MB/s | ~150 MB/s | ~same | ~same
Many small writes (64B x 10000) | ~75 MB/s | ~73 MB/s | ~80 MB/s | ~same | ~same
Batch writes (512B x 20 x 200) | ~140 MB/s | ~140 MB/s | ~160 MB/s | ~same | ~same
Push + async iter (2KB x 1000) | ~165 MB/s | ~165 MB/s | ~170 MB/s | ~same | ~same
Scenario | New Stream | Web Stream | Node Stream | New vs Web | New vs Node
---------------------------------+----------------+----------------+----------------+--------------+-------------
Identity transform (8KB x 1000) | ~310 GB/s | ~5 GB/s | ~6 GB/s | 63x faster | 52x faster
XOR transform (8KB x 500) | ~1.3 GB/s | ~1.1 GB/s | ~1.2 GB/s | ~same | ~same
Expanding 1:2 (4KB x 500) | ~63 GB/s | ~4 GB/s | ~4.5 GB/s | 15x faster | 14x faster
Chained 3x (8KB x 500) | ~175 GB/s | ~2 GB/s | ~2.5 GB/s | 86x faster | 70x faster
Async transform (8KB x 300) | ~133 GB/s | ~5 GB/s | ~5.5 GB/s | 29x faster | 24x faster
Scenario | New Stream | Web Stream | Node Stream | New vs Web | New vs Node
---------------------------------+----------------+----------------+----------------+--------------+-------------
Simple pipeline (8KB x 1000) | ~175 GB/s | ~9 GB/s | ~8 GB/s | 19x faster | 22x faster
Pipeline + XOR (8KB x 1000) | ~1.3 GB/s | ~1.1 GB/s | ~1.2 GB/s | ~same | ~same
Multi-stage 3x (8KB x 500) | ~200 GB/s | ~2 GB/s | ~2.5 GB/s | 104x faster | 80x faster
High-freq (64B x 20000) | ~3.8 GB/s | ~200 MB/s | ~250 MB/s | 19x faster | 15x faster
Scenario | New Stream | Web Stream | Node Stream | New vs Web | New vs Node
---------------------------------+----------------+----------------+----------------+--------------+-------------
bytes() (16KB x 500) | ~4.5 GB/s | ~8 GB/s | ~6 GB/s | ~same | ~same
text() (1KB chunks) | ~1.6 GB/s | ~1 GB/s | ~1.1 GB/s | 1.6x faster | 1.4x faster
Async iteration (8KB x 1000) | ~280 GB/s | ~22 GB/s | ~18 GB/s | 13x faster | 15x faster
Iterator loop (8KB x 1000) | ~410 GB/s | ~27 GB/s | ~22 GB/s | 15x faster | 18x faster
- Chained transforms (70-104x faster): Batching dramatically reduces async overhead when data flows through multiple stages
- Simple pipelines (19-22x faster): Less machinery for basic iteration
- Small/tiny chunks (2-5x faster): Batching amortizes per-iteration overhead
- Async iteration (12-18x faster): Native async iterable protocol combined with batching is extremely efficient
- Identity/expanding transforms (14-63x faster): When transforms don't do heavy computation, batching dominates
- CPU-bound transforms (XOR): When the transform does significant work per byte, the overhead difference is masked
- Push streams: All three APIs have similar backpressure overhead
- Large chunk throughput: When chunks are large, all APIs are limited by memory bandwidth
- bytes() collection: All require the same concatenation work
Node.js Streams and Web Streams generally perform similarly, with Node.js Streams having a slight edge in some push-based scenarios due to its mature backpressure implementation. Both are significantly slower than the New Streams API in scenarios where async overhead dominates.
-
Batched model tradeoff: The new API yields
Uint8Array[]batches instead of singleUint8Arraychunks. This amortizes async overhead but adds array allocation for the batch wrapper. -
GC pressure in benchmarks: Rapid benchmarking (100+ iterations) can trigger GC pauses that disproportionately affect the New Streams API due to its object allocation patterns. Real-world usage with longer intervals between stream operations doesn't show this effect.
-
Node.js environment: These benchmarks run in Node.js. Browser performance may differ due to different stream implementations.
-
Variance: Results can vary by 10-20% between runs due to GC, JIT warmup, and system load. "Within noise" or "~same" means the difference is not statistically significant.
-
Reference implementation: This is not a production-optimized implementation. A native implementation could be faster.
-
Benchmark ordering: CPU-intensive benchmarks (like XOR transform) use global warmup of all three implementations before measurement to avoid JIT compilation bias.
-
Node.js Streams specifics: Node.js Streams use
Readable,Transform,PassThrough, andWritablefrom thenode:streammodule. For push-based scenarios,PassThroughwithcork()/uncork()is used for batch writes.