Skip to content

Fix scaling problems#197

Open
frankharkins wants to merge 4 commits intowooorm:mainfrom
frankharkins:FH/edit-map-btree
Open

Fix scaling problems#197
frankharkins wants to merge 4 commits intowooorm:mainfrom
frankharkins:FH/edit-map-btree

Conversation

@frankharkins
Copy link

@frankharkins frankharkins commented Feb 7, 2026

I believe this fixes #113 by switching EditMap from a Vec to a BTreeMap.

I created a version of @robsimmons benchmark and added it to benches. Here's the result of running the large_jsx_expressions benchmark against the main branch (Vec) and my BTreeMap implementation (this PR), varying the num_jsx_lines_per_component variable.

plot

}
fn uuids(len: usize) -> String {
(0..len)
.map(|_| "770f93e8-b4ee-4ce8-ab0f-4ece7d8c1090")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding the same UUID seemed to reproduce the scaling behaviour, I left it like this to avoid adding another dependency.

Comment on lines +5 to +6
fn tiny_markdown_string(c: &mut Criterion) {
let doc = "A *single* [markdown](/path) string!".to_owned();
Copy link
Author

@frankharkins frankharkins Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a very small slowdown in this tiny test case (~8.9us -> ~9.1us; ~3%). This seems to even out very quickly as size grows. I tried creating an adaptive data structure that switched from Vec to BTreeMap at a certain size, but the complexity was quite high and the performance improvement was basically negligible.

Comment on lines 12 to 13
fn readme(c: &mut Criterion) {
let doc = fs::read_to_string("readme.md").unwrap();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Criterion reported no difference for this benchmark.

Comment on lines -79 to -80
self.map
.sort_unstable_by(|a, b| a.0.partial_cmp(&b.0).unwrap());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTreeMap.iter() is sorted already

Comment on lines -130 to +122
while index < edit_map.map.len() {
if edit_map.map[index].0 == at {
edit_map.map[index].1 += remove;

match edit_map.map.get_mut(&at) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think was the source of the quadratic behaviour; beforehand we were iterating through the Vec to find elements, which is $O(n)$. With BTreeMap, we can get an element in $O(log(n))$ time.

@frankharkins frankharkins marked this pull request as ready for review February 7, 2026 22:27
@Murderlon Murderlon requested a review from Copilot February 8, 2026 09:28
@Murderlon
Copy link
Collaborator

I'm not an knowledgable enough in this codebase to have opinions but just to wanted to say this looks like a great optimization 🙏

CI is still failing btw

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the scaling/performance issues reported in #113 by changing EditMap’s internal storage from a linear Vec to an ordered BTreeMap, and adds a benchmark that stresses large MDX/JSX expressions to measure the improvement.

Changes:

  • Switch EditMap from Vec-backed storage to BTreeMap to avoid O(n²) behavior when accumulating edits.
  • Update EditMap::consume to iterate edits in key order (and reverse key order for application) without sorting.
  • Add Criterion benchmarks (including a large JSX-expression case) and add itertools as a dev dependency for string generation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/util/edit_map.rs Replaces the edit record with a BTreeMap and updates edit accumulation/application logic accordingly.
benches/bench.rs Adds new benchmarks, including a large JSX/MDX-style stress case for parser scaling.
Cargo.toml Adds itertools as a dev-dependency to support benchmark string generation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@frankharkins
Copy link
Author

Other than the Clippy lint, I believe CI is failing because our version of swc_common uses serde::__private, which shouldn't have been relied on and has been renamed in more recent patch versions.

I tried upgrading swc_common but it required quite a few changes to the codebase, including changing user-facing error messages. Instead, I've manually edited the lockfile to use the version of serde that was available when this version of swc_common was released.

@Phaqui
Copy link

Phaqui commented Feb 16, 2026

What's the holdup on this? Is there anything I can do to help? For my use case, the runtime of a tool I have that checks links in markdown files, went from around 2 minutes, to 2 seconds, with this pr, with exactly the same behavior.

@Murderlon Murderlon requested a review from wooorm February 16, 2026 17:20
@frankharkins
Copy link
Author

Glad to hear it worked well! I imagine the maintainers are busy and this isn't the highest priority thing on their plate.

To maintainers: If you have a working lockfile, feel free to push it over mine to make review easier.

@Murderlon
Copy link
Collaborator

friendly ping @wooorm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: larger MDX files are unmanagably slow to parse

4 participants