Conversation
Commit 3086902 already made combine() mutate overflow objects.
|
|
||
| if (isArray(target) && isArray(source)) { | ||
| source.forEach(function (item, i) { | ||
| var sourceOwnProperties = Object.getOwnPropertyNames(source); |
There was a problem hiding this comment.
we can't rely on Object.getOwnPropertyNames existing. additionally, source is an array here, so why wouldn't we want to just iterate from 0 to source.length?
There was a problem hiding this comment.
I did not notice that qs supports Node.js versions older than 0.10, sorry.
why wouldn't we want to just iterate from 0 to
source.length?
As I wrote in the description, when parsing indexed arrays merge() is called for each element with target being the "accumulated" array and source a sparse array with only one element. .forEach internally is for (let k = 0; k < arr.length; k++) { if (arr.hasOwn(k) { cb(arr[k], k, arr); } } (ArrayForEachLoopContinuation in V8). With large sparse arrays this takes some time and combined with repeatedly calling merge() is O(n^2).
This is the less practical part of the PR, because for values of parameterLimit and arrayLimit up to 1000 (arrayLimit is smaller by default), the current code is still faster.
| var length; | ||
| var result; | ||
| if (Array.isArray(a)) { | ||
| length = Array.isArray(b) ? a.push.apply(a, b) : a.push(b); | ||
| result = a; | ||
| } else { | ||
| result = [].concat(a, b); | ||
| length = result.length; | ||
| } | ||
| if (length > arrayLimit) { | ||
| return markOverflow(arrayToObject(result, { plainObjects: plainObjects }), length - 1); |
There was a problem hiding this comment.
changing from "only relying on .concat" to "relying on push and apply and concat" isn't an improvement, unfortunately. also, we can't rely on .apply being present.
There was a problem hiding this comment.
I agree that the code isn't the most beautiful with 3 different cases of combining arrays, but it is much faster. I can't think of a faster way to combine 2 arrays without relying on Node.js 0.10+ features :(
| var combined = utils.combine(a, b); | ||
|
|
||
| st.deepEqual(a, [1], 'a is not mutated'); | ||
| st.deepEqual(a, [1, 2], 'a is mutated'); |
There was a problem hiding this comment.
if tests are changed, it's a breaking change. tests should not be changed.
There was a problem hiding this comment.
a already is mutated in some cases and there is a test which checks for this behaviour:
Lines 235 to 236 in 6bdfaf5
I believe that in #185 (comment) you were willing to accept mutation if you see real performance improvements and there certainly is one for 100 elements (over 2x for whole qs.parse()). For 20 elements it is only about 4%, but I have trouble measuring it precisely (I can try improving the measurements if you want).
I'll admit that there may be some way to break this with custom decoder that returns an array with overridden .push(). Array.prototype.push.call(a, b) and Array.prototype.push.apply(a, b) might be better in this case.
This PR improves the performance by a few orders of magnitude in my benchmarks, mostly by changing O(n^2) code to something linear. There are 2 separate cases and I can separate one of them into a separate PR for easier review.
Duplicate keys
This is something that was recently noticed in #539, but there had already been some work in unmerged PR's from @elidoran - #185 and #189.
The
[].concat(a, b)makes copies of continually larger arrays every timecombine()is called. There are (were) tests that check ifcombine()does not mutate input arguments, but 3086902 introduced new test cases that check if the first (a) input is mutated if it is an overflow object. I know that mutating inputs was one of the objections in #185 and #189, but I believe that the performance improvement is large enough to allow it.This PR does not change
arrayLimitbehaviour, which is wrong (index vs length - #540) and probably should not even be there (#537 (comment), #294).This change should slightly reduce memory usage by making less copies (it reduces the number of allocations, but peak memory usage depends on GC; GHSA-6rw7-vpxm-498p mentions memory exhaustion, but it is not clear how could that happen) and makes the performance comparable to 6.14.1 with
arrayLimit: -1(in prior versions duplicates are parsed as arrays regardless ofarrayLimitandparseArrays- #543)Memory usage change
This was tested by parsing a string with 100000
a[]=b...belements, where eachb...bstring had 1000 characters - the input is 100 MB of text. Options:{ parameterLimit: 100_000, arrayLimit: 100_000 }16.4.0 "Allocation" view from "Allocation timeline":
This PR (f8ee66f):
My benchmark results are variable, because I have a noisy environment (lots of things open, CPU frequency and core assignment not locked), but the general order of magnitude tells the difference.
Unfortunately, this may cause regressions with huge arrays, because
Function.apply()pushes the arguments (all elements of arraybin this case) to the stack. I don't think that this is a huge problem given that this happens at over 100k elements that previously would be parsed in about ~30 seconds. When exactly will this fail depends on the available size of the stack and may be hard to predict.Benchmark code
Possible input combinations are:
ais an array,bis not an array -a.push(b)ais an array,bis an array -a.push.apply(b)ais not an array,bis not an array -[].concat(a, b)ais not an array,bis an array -[].concat(a, b)- detected only once in whole test suite, so I don't thinkb.unshift(a)makes senseThis part includes first 2 commits:
.concat()withpush()if possibleIndexed arrays
Here the problem lies in
merge()and thesource.forEach()call. When a string likea[0]=b&a[1]=b&...is parsed themerge()function is called with values oftargetandsourcelike those below:While the callback of
.forEach()is executed only for non-empty array items in sparse arrays, it looks like it still is linear with the size of array and not the number of existing items. Since the execution time of.forEach()and the number of calls tomerge()scale with N (number of elements in array with growing indices), this has O(n^2) time complexity.The idea that I had is iterating over own properties and using the fact that for arrays they always are the indices of elements followed by
'length'.Benchmarks use the same code as before, but the second query is uncommented.
I'm not entirely convinced if this is something worth changing, because for reasonable
arrayLimit(affects max.forEach()execution time) andparameterLimit(affects number ofmerge()calls) values the current code performs slightly better. If there are people who change both of those limits, then this may be a real problem.This part is in the third commit - f8ee66f.