Improve array parsing performance by krzysdz · Pull Request #544 · ljharb/qs

krzysdz · 2026-01-26T01:11:03Z

This PR improves the performance by a few orders of magnitude in my benchmarks, mostly by changing O(n^2) code to something linear. There are 2 separate cases and I can separate one of them into a separate PR for easier review.

Duplicate keys

This is something that was recently noticed in #539, but there had already been some work in unmerged PR's from @elidoran - #185 and #189.

The [].concat(a, b) makes copies of continually larger arrays every time combine() is called. There are (were) tests that check if combine() does not mutate input arguments, but 3086902 introduced new test cases that check if the first (a) input is mutated if it is an overflow object. I know that mutating inputs was one of the objections in #185 and #189, but I believe that the performance improvement is large enough to allow it.

This PR does not change arrayLimit behaviour, which is wrong (index vs length - #540) and probably should not even be there (#537 (comment), #294).

This change should slightly reduce memory usage by making less copies (it reduces the number of allocations, but peak memory usage depends on GC; GHSA-6rw7-vpxm-498p mentions memory exhaustion, but it is not clear how could that happen) and makes the performance comparable to 6.14.1 with arrayLimit: -1 (in prior versions duplicates are parsed as arrays regardless of arrayLimit and parseArrays - #543)

Memory usage change

This was tested by parsing a string with 100000 a[]=b...b elements, where each b...b string had 1000 characters - the input is 100 MB of text. Options: { parameterLimit: 100_000, arrayLimit: 100_000 }

16.4.0 "Allocation" view from "Allocation timeline":

parseQueryStringValues allocations total count 890151 (live 98345) with size 1474 MB (live 99.9 MB); combine allocations total count 3327 (live 3) with size 5.8 MB (live 168 B)

This PR (f8ee66f):

parseQueryStringValues allocations total count 592909 (live 74896) with size 128 MB (live 76.0 MB); combine allocations total count 17 (live 5) with size 35.4 kB (live 544 B)

My benchmark results are variable, because I have a noisy environment (lots of things open, CPU frequency and core assignment not locked), but the general order of magnitude tells the difference.

N	6.14.1 (ms)	PR `f8ee66f` (ms)
100	0.48	0.20
1000	3.3	0.93
10000	220	8.7
100000	29576	68
125000	45686	78
1000000	at least 15 minutes, Ctrl+C	call stack size exceeded

Unfortunately, this may cause regressions with huge arrays, because Function.apply() pushes the arguments (all elements of array b in this case) to the stack. I don't think that this is a huge problem given that this happens at over 100k elements that previously would be parsed in about ~30 seconds. When exactly will this fail depends on the available size of the stack and may be hard to predict.

Benchmark code

const {
  timerify,
  performance,
  PerformanceObserver,
} = require("node:perf_hooks");
const qs = require(".");

const N = 10_000;
const query = Array(N).fill("a[]=b").join("&");
// const query = Array(N).fill().map((_,i)=>`a[${i}]=b`).join("&");
// const query = Array(N).fill(`a[]=${Array(1000).fill("b").join("")}`).join("&");

function parseArray() {
    return qs.parse(query, {
        parameterLimit: N,
        arrayLimit: N
    });
}
const timedParseArray = timerify(parseArray);

const obs = new PerformanceObserver((list) => {
    for (const { name, duration } of list.getEntries()) {
        console.log(`Execution of ${name} took ${duration} ms`);
    }

    performance.clearMarks();
    performance.clearMeasures();
    obs.disconnect();
});
obs.observe({ entryTypes: ["function"] });

// warmup (not registered)
parseArray();

const arr = timedParseArray().a;
console.assert(Array.isArray(arr), "arr not an array");

Possible input combinations are:

a is an array, b is not an array - a.push(b)
a is an array, b is an array - a.push.apply(b)
a is not an array, b is not an array - [].concat(a, b)
a is not an array, b is an array - [].concat(a, b) - detected only once in whole test suite, so I don't think b.unshift(a) makes sense

This part includes first 2 commits:

2870c9c - replace .concat() with push() if possible
15b1e5e - allow mutation of the first argument in tests that did not allow it

Indexed arrays

Here the problem lies in merge() and the source.forEach() call. When a string like a[0]=b&a[1]=b&... is parsed the merge() function is called with values of target and source like those below:

[ 'b' ] [ <1 empty item>, 'b' ]
[ 'b', 'b' ] [ <2 empty items>, 'b' ]
[ 'b', 'b', 'b' ] [ <3 empty items>, 'b' ]
[ 'b', 'b', 'b', 'b' ] [ <4 empty items>, 'b' ]

While the callback of .forEach() is executed only for non-empty array items in sparse arrays, it looks like it still is linear with the size of array and not the number of existing items. Since the execution time of .forEach() and the number of calls to merge() scale with N (number of elements in array with growing indices), this has O(n^2) time complexity.

The idea that I had is iterating over own properties and using the fact that for arrays they always are the indices of elements followed by 'length'.

Benchmarks use the same code as before, but the second query is uncommented.

N	6.14.1 (ms)	PR `f8ee66f` (ms)
100	0.30	0.31
1000	2.8	3.3
5000	224	11
10000	911	22
50000	22683	106
100000	90571	221
1000000	did not even try	2193

I'm not entirely convinced if this is something worth changing, because for reasonable arrayLimit (affects max .forEach() execution time) and parameterLimit (affects number of merge() calls) values the current code performs slightly better. If there are people who change both of those limits, then this may be a real problem.

This part is in the third commit - f8ee66f.

Commit 3086902 already made combine() mutate overflow objects.

ljharb · 2026-01-27T06:26:55Z


    if (isArray(target) && isArray(source)) {
-        source.forEach(function (item, i) {
+        var sourceOwnProperties = Object.getOwnPropertyNames(source);


we can't rely on Object.getOwnPropertyNames existing. additionally, source is an array here, so why wouldn't we want to just iterate from 0 to source.length?

I did not notice that qs supports Node.js versions older than 0.10, sorry.

why wouldn't we want to just iterate from 0 to source.length?

As I wrote in the description, when parsing indexed arrays merge() is called for each element with target being the "accumulated" array and source a sparse array with only one element. .forEach internally is for (let k = 0; k < arr.length; k++) { if (arr.hasOwn(k) { cb(arr[k], k, arr); } } (ArrayForEachLoopContinuation in V8). With large sparse arrays this takes some time and combined with repeatedly calling merge() is O(n^2).

This is the less practical part of the PR, because for values of parameterLimit and arrayLimit up to 1000 (arrayLimit is smaller by default), the current code is still faster.

maybe we should branch at some arbitrary limit, and use the current code for the common case?

ljharb · 2026-01-27T06:27:35Z

+    var length;
+    var result;
+    if (Array.isArray(a)) {
+        length = Array.isArray(b) ? a.push.apply(a, b) : a.push(b);
+        result = a;
+    } else {
+        result = [].concat(a, b);
+        length = result.length;
+    }
+    if (length > arrayLimit) {
+        return markOverflow(arrayToObject(result, { plainObjects: plainObjects }), length - 1);


changing from "only relying on .concat" to "relying on push and apply and concat" isn't an improvement, unfortunately. also, we can't rely on .apply being present.

I agree that the code isn't the most beautiful with 3 different cases of combining arrays, but it is much faster. I can't think of a faster way to combine 2 arrays without relying on Node.js 0.10+ features :(

does avoiding the concat really offer such a large perf improvement that it's worth mutation?

According to the benchmarks that I ran before creating this PR, at 1000 repeated parameters the total parsing time reduced from 3.3 ms to ~1 ms. At 100 the difference was closer to 2.5x with just ~0.28 ms reduction. Since it changes the algorithmic complexity, the results depend on the data size.

You certainly know better how people use qs and whether someone increases the parameterLimit and arrayLimit to something that is more problematic. For up to 50 array elements there's not much difference (~0.16 ms regardless of version), but at 100 and above the improvement is hard to ignore.

ljharb · 2026-01-27T06:27:49Z

        var combined = utils.combine(a, b);

-        st.deepEqual(a, [1], 'a is not mutated');
+        st.deepEqual(a, [1, 2], 'a is mutated');


if tests are changed, it's a breaking change. tests should not be changed.

a already is mutated in some cases and there is a test which checks for this behaviour:

qs/test/utils.js

Lines 235 to 236 in 6bdfaf5

var combined = utils.combine(overflow, 'c', 10, false);

s2t.equal(combined, overflow, 'returns the same object (mutated)');

I believe that in #185 (comment) you were willing to accept mutation if you see real performance improvements and there certainly is one for 100 elements (over 2x for whole qs.parse()). For 20 elements it is only about 4%, but I have trouble measuring it precisely (I can try improving the measurements if you want).

I'll admit that there may be some way to break this with custom decoder that returns an array with overridden .push(). Array.prototype.push.call(a, b) and Array.prototype.push.apply(a, b) might be better in this case.

even better, is a[a.length] = b whenever possible.

krzysdz added 3 commits January 25, 2026 01:10

[Performance] avoid costly Array.concat

2870c9c

[Tests] allow combine() to mutate arrays

15b1e5e

Commit 3086902 already made combine() mutate overflow objects.

[Performance] make merge() non-quadratic

f8ee66f

ljharb requested changes Jan 27, 2026

View reviewed changes

krzysdz mentioned this pull request Apr 19, 2026

fix: keep repeated extended query params as arrays beyond 20 values expressjs/express#7151

Open

ljharb marked this pull request as draft April 27, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve array parsing performance#544

Improve array parsing performance#544
krzysdz wants to merge 3 commits into
ljharb:mainfrom
krzysdz:array-perf

krzysdz commented Jan 26, 2026

Uh oh!

ljharb Jan 27, 2026

Uh oh!

krzysdz Jan 27, 2026 •

edited

Loading

Uh oh!

ljharb Apr 27, 2026

Uh oh!

ljharb Jan 27, 2026

Uh oh!

krzysdz Jan 27, 2026

Uh oh!

ljharb Apr 27, 2026

Uh oh!

krzysdz Apr 27, 2026

Uh oh!

ljharb Jan 27, 2026

Uh oh!

krzysdz Jan 27, 2026

Uh oh!

ljharb Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	var combined = utils.combine(overflow, 'c', 10, false);
	s2t.equal(combined, overflow, 'returns the same object (mutated)');

Uh oh!

Conversation

krzysdz commented Jan 26, 2026

Duplicate keys

Indexed arrays

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krzysdz Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krzysdz Jan 27, 2026 •

edited

Loading