-
Notifications
You must be signed in to change notification settings - Fork 5.9k
perf(fmt): speed up file diffing #30644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+191
−137
Merged
Changes from 17 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
e895125
perf: speed up file diffing
KnorpelSenf 9aab445
style: fix lints
KnorpelSenf 0770d7f
Merge branch 'main' into fast-file-diff
KnorpelSenf d257d1a
build: enabled unified_diff feature for imara
KnorpelSenf 4bd1ca6
Merge branch 'main' into fast-file-diff
KnorpelSenf 1b1496c
Merge remote-tracking branch 'fork/fast-file-diff' into fast-file-diff
KnorpelSenf f5d0f2b
Merge branch 'main' into fast-file-diff
KnorpelSenf b834694
feat: restore first bits of diff formatting
KnorpelSenf 7a863b4
build: disable unused imara diff feature
KnorpelSenf 0332a9a
test: revert temporary changes
KnorpelSenf 85057ca
fix: bad rename
KnorpelSenf ed4188a
fix: lints
KnorpelSenf ebc2b9c
Merge branch 'main' into fast-file-diff
KnorpelSenf b2fb495
Merge branch 'main' into fast-file-diff
KnorpelSenf 15bfdd9
Merge branch 'main' into fast-file-diff
KnorpelSenf 60ed8e5
Merge branch 'main' into fast-file-diff
bartlomieju c151892
fix: properly handle multi-hunk diffs and line number tracking
bartlomieju b7c83b1
fix: show context lines between hunks, fix Cargo.lock
bartlomieju 057583a
fix: interleave delete/insert pairs and run formatter
bartlomieju ae9f49b
fix: resolve clippy warnings and remove context lines between hunks
bartlomieju 6ac9249
fix: update frozen lockfile test expectations for imara-diff
bartlomieju 8d30685
fix: preserve newline-only changes in diff output
bartlomieju f9f56af
fix: add separator between non-contiguous hunks in diff output
bartlomieju a24eb00
update tests
bartlomieju 1311270
fix the test
bartlomieju File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sounds good. I kind of wonder if there's a diffing library that allows bailing after X many differences though as it would work well for incredibly large files. I wonder if we could contribute that to dissimilar and if they'd take a patch that does that (maybe it's not too difficult?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened dtolnay/dissimilar#21 -- it might be more worthwhile to pursue this path than rewrite to imara-diff, which still might not be fast enough with very large files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both dissimilar and imara expose an iterator over the patches, so I would assume that we can just stop iterating and thereby abort the computation of the diff early.
I have yet to check if my assumption is correct, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason beyond the cost of migration why you'd like to stay with dissimilar? From my superficial understanding, it looks like imara is simply a better (=faster) diffing lib in all respects that are relevant for deno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the diff output of dissimilar is ok, but not sure yet about imara. Generally diffs are only shown in error cases so perf doesn't matter too much, but obviously several minutes is not acceptable 😅. How much faster is imara for this diff? I guess if it's fast enough on this case then maybe that's good enough and we don't need to worry about doing some iterator or max results approach.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine, deno 2.5.0 needs around 36 minutes (!) to check the formatting of the file and print out the diff. My branch (still in debug build, did not compile with
-ryet) cuts it down to 0.4 seconds.I did not try larger files using Deno 2.5.0 but I tried them with this branch. The results are as follows:
(not evaluated this very scientifically, please take it with a grain of salt)
All files had a similar format as shown #30634.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have started working on bringing back properly formatted diffs. One thing I have noticed is that imara is extremely good at finding line diffs, but it does not have built-in word-diffing (see pascalkuthe/imara-diff#1). I will ask if they accept contributions, but otherwise I'm afraid we will have to add the complexity here. This is something that dissimilar provides out of the box, but they seem to do it by not even tokenizing the input at all, which explains why it is so slow. (Note that this also means that imara might get a lot slower once we run word diffs with it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream work continues: pascalkuthe/imara-diff#33
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my upstream work gets merged, the debug build of imara will be able to diff this same string is less then 1 second including the full word diff of the file (down from 36 minutes on Deno's main branch built in release mode)