Open
Description
Bug report
Bug description:
Experimentally I measured a huge performance improvement when I switched my code from
json.dump(x, f, **)
to
f.write(json.dumps(x, **))
Method
I essentially wrote the same contents to different files sequentially and measured the total amount of time taken. The json contents had 1, 300, and 400 entries per level, and 1, 5, and 6 levels of depth. There's quite a level of variance here but this wasn't what I was trying to measure in the first place. I discovered this by chance, so forgive the lack of precision. I also don't have the source code anymore because I wasn't originally planning to report this discovery.
Results
File Size | Consecutive Files | dump µs | dumps µs |
---|---|---|---|
74 | 1 | 508 | 581 |
74 | 2 | 520 | 541 |
74 | 4 | 1153 | 1151 |
74 | 8 | 1930 | 1750 |
39184 | 1 | 6363 | 1086 |
39184 | 2 | 11261 | 1821 |
39184 | 4 | 38126 | 3521 |
39184 | 8 | 80411 | 6466 |
468218 | 1 | 82821 | 11921 |
468218 | 2 | 150234 | 38017 |
468218 | 4 | 302357 | 42137 |
468218 | 8 | 573450 | 78545 |
Conclusion
A cursory investigation into the cpython code suggests that the slow part is the sequential writing of the iterencode
yield. The chunks are quite small.
CPython versions tested on:
3.10
Operating systems tested on:
macOS
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status
No status