Replace computeLength with inflate #8

nwinn-student · 2025-11-01T04:11:48Z

Inflate lazily doubles the buffer's size on demand instead of calculating the length of the buffer immediately.

The initial buffer size and multiplier of the buffer size can be honed further. I do not have enough datasets to gauge the optimal values.

I initialized the buffer size to 64 since I expect most MessagePack users encode tables and not individual values.

Reasoning:
computeLength needs to do redundant work. It needs to obtain the data's type and to check the table's size, both moderately expensive.

inflate removes the aforementioned redundancy, albeit at a cost since there is less information about the data when encoding.

Pros:

29% faster in the provided case, but some other cases are 15-20% faster

Cons:

134% more memory used in the provided case, but some other cases are 150-200% more memory used
Code is less readable
Performance is less stable for the provided case, and some cases will become slower.

Future Work:

Should this PR get accepted, look into improving the readability and performance further.
Should this PR get rejected, look into harnessing the knowledge from computeLength to improve performance when encoding. If possible.
- I do not expect fruitful results, but passing the table size has potential.

Benchmarks:
For benchmarking, the following datasets were used:

If the websites go down containing the datasets, aside from the one provided here, msgpack-default, and others..., they can be found at awesome-json-datasets).

old

Dataset Name	Time (s)	Space (kB)	Output Size (b)
benchdata	4.374e-3 +- 7.5e-5	17 +- 0	17020
circleciblank	3.099e-6 +- 1.0e-7	0 +- 0	10
circlecimatrix	2.270e-5 +- 5.0e-7	1 +- 1	72
commitlint	1.550e-5 +- 4.0e-7	0 +- 1	74
commitlintbasic	2.699e-6 +- 9.9e-8	0 +- 0	17
epr	4.360e-5 +- 2.8e-6	1 +- 0	412
eslintrc	7.679e-5 +- 4.6e-6	1 +- 1	971
esmrc	9.799e-6 +- 1.0e-7	0 +- 0	64
food-facts	2.761e-3 +- 5.2e-5	49 +- 1	34204
geojson	7.049e-5 +- 1.2e-5	1 +- 0	162
githubfundingblank	2.999e-6 +- 9.9e-8	0 +- 0	24
githubworkflow	3.340e-5 +- 2.8e-6	1 +- 0	287
gruntcontribclean	1.580e-5 +- 3.1e-6	0 +- 1	60
imageoptimizerwebjob	1.190e-5 +- 3.9e-7	0 +- 1	61
jsonereversesort	1.909e-5 +- 1.1e-5	0 +- 1	52
jsonesort	9.200e-6 +- 1.0e-7	0 +- 0	21
jsonfeed	2.280e-5 +- 8.9e-7	1 +- 1	517
jsonresume	1.376e-4 +- 7.2e-6	4 +- 1	2749
msgpack-default	8.072e-4 +- 2.4e-5	196 +- 0	192245
netcoreproject	5.079e-5 +- 5.9e-6	1 +- 0	919
nightwatch	8.679e-5 +- 2.2e-5	2 +- 1	1037
openweathermap	7.300e-5 +- 5.5e-6	1 +- 1	382
openweatherroadrisk	5.829e-5 +- 1.6e-6	1 +- 0	339
packagejson	9.499e-5 +- 5.4e-5	2 +- 1	1995
packagejsonlintrc	7.139e-5 +- 3.6e-6	2 +- 1	989
pokedex	5.350e-3 +- 1.0e-4	76 +- 0	45094
prize	1.251e-2 +- 1.7e-4	325 +- 0	201437
sapcloudsdkpipeline	1.699e-6 +- 1.0e-7	0 +- 0	1
travisnotifications	2.789e-5 +- 1.9e-6	1 +- 0	627
tslintbasic	8.799e-6 +- 3.9e-7	0 +- 0	51
tslintextend	5.000e-6 +- 1.9e-7	0 +- 0	55
tslintmulti	1.440e-5 +- 1.6e-6	0 +- 1	68

new

Dataset Name	Time (s)	Space (kB)	Output Size (b)
benchdata	3.853e-3 +- 9.4e-5	64 +- 1	17020
circleciblank	2.199e-6 +- 1.0e-7	0 +- 0	10
circlecimatrix	1.790e-5 +- 4.9e-7	1 +- 1	72
commitlint	1.169e-5 +- 2.0e-7	0 +- 1	74
commitlintbasic	1.999e-6 +- 1.0e-7	0 +- 0	17
epr	3.330e-5 +- 2.3e-6	2 +- 1	412
eslintrc	5.940e-5 +- 1.9e-6	2 +- 1	971
esmrc	7.899e-6 +- 3.8e-6	0 +- 0	64
food-facts	2.218e-3 +- 6.9e-5	144 +- 0	34204
geojson	5.350e-5 +- 1.0e-6	2 +- 1	162
githubfundingblank	2.300e-6 +- 1.0e-7	0 +- 0	24
githubworkflow	2.719e-5 +- 2.2e-6	2 +- 1	287
gruntcontribclean	1.169e-5 +- 4.9e-7	0 +- 1	60
imageoptimizerwebjob	8.799e-6 +- 7.0e-7	0 +- 1	61
jsonereversesort	1.379e-5 +- 2.9e-7	0 +- 1	52
jsonesort	5.999e-6 +- 1.0e-7	0 +- 0	21
jsonfeed	2.020e-5 +- 9.9e-7	2 +- 0	517
jsonresume	1.139e-4 +- 7.1e-6	9 +- 0	2749
msgpack-default	7.458e-4 +- 2.7e-4	460 +- 1	192245
netcoreproject	4.339e-5 +- 4.3e-6	2 +- 1	919
nightwatch	7.130e-5 +- 5.9e-6	5 +- 1	1037
openweathermap	5.989e-5 +- 8.7e-6	1 +- 1	382
openweatherroadrisk	4.549e-5 +- 3.3e-6	2 +- 1	339
packagejson	7.789e-5 +- 5.7e-6	5 +- 1	1995
packagejsonlintrc	5.840e-5 +- 5.0e-6	3 +- 1	989
pokedex	4.327e-3 +- 9.3e-5	160 +- 0	45094
prize	9.974e-3 +- 2.2e-4	640 +- 0	201437
sapcloudsdkpipeline	1.200e-6 +- 9.9e-8	0 +- 0	1
travisnotifications	2.440e-5 +- 4.9e-6	3 +- 1	627
tslintbasic	6.700e-6 +- 3.0e-7	0 +- 0	51
tslintextend	3.999e-6 +- 1.9e-7	0 +- 0	55
tslintmulti	1.070e-5 +- 3.9e-7	0 +- 1	68

inflate lazily doubles the buffer's size on demand. While memory does suffer, speed improves.

cipharius · 2025-11-03T12:31:28Z

Thank you for measuring the performance impact due to buffer size computation!

How about instead of inflating we just create a buffer that is same size as input string?
MessagePack should always produce shorter output than the input JSON, so it's a safe upper bound for the buffer.

The memory overhead in this context is practically meaningless - what matters is the output size.

nwinn-student · 2025-11-03T20:14:27Z

How about instead of inflating we just create a buffer that is same size as input string?
MessagePack should always produce shorter output than the input JSON, so it's a safe upper bound for the buffer.

I presume you are saying: "We can speed up computeLength instead by approximating the length".

I checked it out, json-size from json-joy tackles this issue in ts. Taking some of the approaches let me speed up computeLength by 1-4%, but the table section remained mostly the same.

Sadly, the speedup wasn't enough to outweigh the cost of a larger buffer size. In the end it causes a very slight slowdown for some cases and very slight speedup for others. Thus, overall the same performance.

The code can be found here.

cipharius · 2025-11-08T16:55:23Z

Never mind the previous remark, I forgot that encode takes Luau structure and produces msgpack encoded string, not json to msgpack.

I will see how to incorporate these findings. But first I'd like to improve the test and benchmark harness to use the standalone Luau interpreter. That way tests and benchmarks can be run automatically and avoid the need for environment that supports Roblox Studio and the benchmarking plugin.

nwinn-student · 2025-11-26T07:23:02Z

The benchmarks themselves needed fluffing. There was not enough variety between the datasets, so I added others.... I am uncertain the best way to visualize the results, so bear with me.

On another note, when profiling I observed that computeLength takes up less of the overall time when serializing large datasets. I thought that this would mean larger datasets would perform better using the computeLength approach, but to my surprise the inflate approach performed better.

Removed computeLength approach and added inflate approach

57d0690

inflate lazily doubles the buffer's size on demand. While memory does suffer, speed improves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace computeLength with inflate #8

Replace computeLength with inflate #8

Uh oh!

nwinn-student commented Nov 1, 2025 •

edited

Loading

Uh oh!

cipharius commented Nov 3, 2025

Uh oh!

nwinn-student commented Nov 3, 2025

Uh oh!

cipharius commented Nov 8, 2025

Uh oh!

nwinn-student commented Nov 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Replace computeLength with inflate #8

Are you sure you want to change the base?

Replace computeLength with inflate #8

Uh oh!

Conversation

nwinn-student commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cipharius commented Nov 3, 2025

Uh oh!

nwinn-student commented Nov 3, 2025

Uh oh!

cipharius commented Nov 8, 2025

Uh oh!

nwinn-student commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nwinn-student commented Nov 1, 2025 •

edited

Loading

nwinn-student commented Nov 26, 2025 •

edited

Loading