This section focuses on memory performance. When processing huge files, the goal is to keep the memory baseline flat and GC pauses to an absolute minimum, entirely independent of the input size.
Note
The following examples are run on Node.js using a 1 GB JSON file. Performance profiling is
generated via clinic.
This scenario demonstrates the absolute base cost of parsing. We use the core JSONTextDecoder to
read chunks from a 1 GB file, tokenize them, and immediately discard the tokens.
import { createReadStream } from "node:fs";
import { JSONTextDecoder } from "jsontext";
const decoder = new JSONTextDecoder();
const stream = createReadStream("data.json");
for await (const chunk of stream) {
decoder.push(chunk);
while (decoder.readToken() !== undefined) {
/** Drain */
}
}
decoder.end();
decoder.checkEOF();This scenario represents a full I/O cycle. We stream bytes from the 1 GB file, decode them into
Tokens using the core JSONTextDecoder, immediately feed those tokens into JSONTextEncoder, and
write the re-encoded bytes to a destination /dev/null.
import { createReadStream, createWriteStream } from "node:fs";
import { JSONTextDecoder, JSONTextEncoder } from "jsontext";
const input = createReadStream("data.json");
const output = createWriteStream("/dev/null");
const decoder = new JSONTextDecoder();
const encoder = new JSONTextEncoder();
for await (const chunk of input) {
decoder.push(chunk);
for (let token; (token = decoder.readToken()) !== undefined;) {
encoder.writeToken(token);
}
const bytes = encoder.takeBytes();
if (bytes.length > 0) {
output.write(bytes);
}
}
decoder.end();
decoder.checkEOF();
output.end();Important
Using JSONTextDecoderStream and JSONTextEncoderStream directly in Node.js requires .toWeb()
to convert to Web Streams, which adds an extra buffering layer and can push Heap Used up to 300 MB
before triggering GC in this scenario.
This scenario demonstrates a data querying use case. We use JSONTextSelectorStream with a
descendant JSON Path expression $..id to scan the entire 1 GB file. For each match, we call
json() to decode the value into a JavaScript object, and keep a count of the total matches.
Since JSONTextSelectorStream is a Web Streams TransformStream, we use .toWeb() to bridge
Node.js streams.
import { JSONTextSelectorStream } from "jsontext";
import { createReadStream } from "node:fs";
import { Readable } from "node:stream";
const stream = createReadStream("data.json");
const selector = new JSONTextSelectorStream("$..id");
let count = 0;
for await (const value of Readable.toWeb(stream).pipeThrough(selector)) {
value.json();
count++;
}
console.log(`Total values: ${count}`);
// Total values: 565255 for the 1 GB file used in this exampleTip
JSONTextSelectorStream only emits matched values, so the frequency of .enqueue() calls is low
bounded by the number of matches, not the number of tokens. This makes the microtask overhead from
Web Streams acceptable here. In the Round Trip scenario every token triggers an .enqueue(),
which creates enough microtask pressure to push Heap Used to 300 MB and trigger GC. That is why we
use the core APIs directly there instead.


