Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e895125
perf: speed up file diffing
KnorpelSenf Sep 8, 2025
9aab445
style: fix lints
KnorpelSenf Sep 8, 2025
0770d7f
Merge branch 'main' into fast-file-diff
KnorpelSenf Sep 8, 2025
d257d1a
build: enabled unified_diff feature for imara
KnorpelSenf Sep 8, 2025
4bd1ca6
Merge branch 'main' into fast-file-diff
KnorpelSenf Sep 10, 2025
1b1496c
Merge remote-tracking branch 'fork/fast-file-diff' into fast-file-diff
KnorpelSenf Sep 18, 2025
f5d0f2b
Merge branch 'main' into fast-file-diff
KnorpelSenf Sep 18, 2025
b834694
feat: restore first bits of diff formatting
KnorpelSenf Sep 18, 2025
7a863b4
build: disable unused imara diff feature
KnorpelSenf Sep 18, 2025
0332a9a
test: revert temporary changes
KnorpelSenf Sep 18, 2025
85057ca
fix: bad rename
KnorpelSenf Sep 18, 2025
ed4188a
fix: lints
KnorpelSenf Sep 18, 2025
ebc2b9c
Merge branch 'main' into fast-file-diff
KnorpelSenf Sep 27, 2025
b2fb495
Merge branch 'main' into fast-file-diff
KnorpelSenf Nov 7, 2025
15bfdd9
Merge branch 'main' into fast-file-diff
KnorpelSenf Nov 22, 2025
60ed8e5
Merge branch 'main' into fast-file-diff
bartlomieju Mar 12, 2026
c151892
fix: properly handle multi-hunk diffs and line number tracking
bartlomieju Mar 12, 2026
b7c83b1
fix: show context lines between hunks, fix Cargo.lock
bartlomieju Mar 12, 2026
057583a
fix: interleave delete/insert pairs and run formatter
bartlomieju Mar 12, 2026
ae9f49b
fix: resolve clippy warnings and remove context lines between hunks
bartlomieju Mar 12, 2026
6ac9249
fix: update frozen lockfile test expectations for imara-diff
bartlomieju Mar 12, 2026
8d30685
fix: preserve newline-only changes in diff output
bartlomieju Mar 12, 2026
f9f56af
fix: add separator between non-contiguous hunks in diff output
bartlomieju Mar 12, 2026
a24eb00
update tests
bartlomieju Mar 12, 2026
1311270
fix the test
bartlomieju Mar 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,7 @@ dprint-plugin-markdown = "=0.20.0"
dprint-plugin-typescript = "=0.95.15"
env_logger = "=0.11.6"
fancy-regex = "=0.14.0"
imara-diff = "=0.2.0"
Copy link
Member

@dsherret dsherret Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to fix this up if I get an OK about the general direction.

I think this sounds good. I kind of wonder if there's a diffing library that allows bailing after X many differences though as it would work well for incredibly large files. I wonder if we could contribute that to dissimilar and if they'd take a patch that does that (maybe it's not too difficult?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened dtolnay/dissimilar#21 -- it might be more worthwhile to pursue this path than rewrite to imara-diff, which still might not be fast enough with very large files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to fix this up if I get an OK about the general direction.

I think this sounds good. I kind of wonder if there's a diffing library that allows bailing after X many differences though as it would work well for incredibly large files. I wonder if we could contribute that to dissimilar and if they'd take a patch that does that (maybe it's not too difficult?).

I think both dissimilar and imara expose an iterator over the patches, so I would assume that we can just stop iterating and thereby abort the computation of the diff early.

I have yet to check if my assumption is correct, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason beyond the cost of migration why you'd like to stay with dissimilar? From my superficial understanding, it looks like imara is simply a better (=faster) diffing lib in all respects that are relevant for deno.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason beyond the cost of migration why you'd like to stay with dissimilar?

We know the diff output of dissimilar is ok, but not sure yet about imara. Generally diffs are only shown in error cases so perf doesn't matter too much, but obviously several minutes is not acceptable 😅. How much faster is imara for this diff? I guess if it's fast enough on this case then maybe that's good enough and we don't need to worry about doing some iterator or max results approach.

Copy link
Contributor Author

@KnorpelSenf KnorpelSenf Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine, deno 2.5.0 needs around 36 minutes (!) to check the formatting of the file and print out the diff. My branch (still in debug build, did not compile with -r yet) cuts it down to 0.4 seconds.

I did not try larger files using Deno 2.5.0 but I tried them with this branch. The results are as follows:

  • 1 MB file: 0.4 seconds
  • 10 MB file: 4 seconds
  • 100 MB file: 40 seconds

(not evaluated this very scientifically, please take it with a grain of salt)

All files had a similar format as shown #30634.

Copy link
Contributor Author

@KnorpelSenf KnorpelSenf Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started working on bringing back properly formatted diffs. One thing I have noticed is that imara is extremely good at finding line diffs, but it does not have built-in word-diffing (see pascalkuthe/imara-diff#1). I will ask if they accept contributions, but otherwise I'm afraid we will have to add the complexity here. This is something that dissimilar provides out of the box, but they seem to do it by not even tokenizing the input at all, which explains why it is so slow. (Note that this also means that imara might get a lot slower once we run word diffs with it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream work continues: pascalkuthe/imara-diff#33

Copy link
Contributor Author

@KnorpelSenf KnorpelSenf Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my upstream work gets merged, the debug build of imara will be able to diff this same string is less then 1 second including the full word diff of the file (down from 36 minutes on Deno's main branch built in release mode)

libsui = "0.12.6"
malva = "=0.12.1"
markup_fmt = "=0.22.0"
Expand Down
4 changes: 2 additions & 2 deletions libs/resolver/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ deno_permissions = { workspace = true, optional = true }
deno_semver.workspace = true
deno_terminal.workspace = true
deno_unsync.workspace = true
dissimilar.workspace = true
futures.workspace = true
http = { workspace = true, optional = true }
imara-diff.workspace = true
import_map.workspace = true
indexmap.workspace = true
jsonc-parser.workspace = true
Expand All @@ -69,5 +69,5 @@ url.workspace = true

[dev-dependencies]
node_resolver.workspace = true
sys_traits = { workspace = true, features = ["memory", "real", "serde_json"] }
sys_traits = { workspace = true, features = ["getrandom", "memory", "real", "serde_json"] }
test_util.workspace = true
173 changes: 67 additions & 106 deletions libs/resolver/display.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
use std::fmt::Write as _;

use deno_terminal::colors;
use dissimilar::Chunk;
use dissimilar::diff as difference;
use imara_diff::Diff;
use imara_diff::InternedInput;

/// Print diff of the same file_path, before and after formatting.
///
Expand All @@ -28,25 +28,25 @@ pub fn diff(orig_text: &str, edit_text: &str) -> String {
DiffBuilder::build(&orig_text, &edit_text)
}

struct DiffBuilder {
struct DiffBuilder<'a> {
input: InternedInput<&'a str>,
output: String,
line_number_width: usize,
orig_line: usize,
edit_line: usize,
orig: String,
edit: String,
has_changes: bool,
}

impl DiffBuilder {
pub fn build(orig_text: &str, edit_text: &str) -> String {
let mut diff_builder = DiffBuilder {
impl<'a> DiffBuilder<'a> {
pub fn build(orig_text: &'a str, edit_text: &'a str) -> String {
let input = InternedInput::new(orig_text, edit_text);
let mut diff = Diff::compute(imara_diff::Algorithm::Histogram, &input);
diff.postprocess_lines(&input);

let diff_builder = DiffBuilder {
input,
output: String::new(),
orig_line: 1,
edit_line: 1,
orig: String::new(),
edit: String::new(),
has_changes: false,
line_number_width: {
let line_count = std::cmp::max(
orig_text.split('\n').count(),
Expand All @@ -55,108 +55,75 @@ impl DiffBuilder {
line_count.to_string().chars().count()
},
};

let chunks = difference(orig_text, edit_text);
diff_builder.handle_chunks(chunks);
diff_builder.output
diff_builder.handle_diff(diff)
}

fn handle_chunks<'a>(&'a mut self, chunks: Vec<Chunk<'a>>) {
for chunk in chunks {
match chunk {
Chunk::Delete(s) => {
let split = s.split('\n').enumerate();
for (i, s) in split {
if i > 0 {
self.orig.push('\n');
}
self.orig.push_str(&fmt_rem_text_highlight(s));
}
self.has_changes = true
}
Chunk::Insert(s) => {
let split = s.split('\n').enumerate();
for (i, s) in split {
if i > 0 {
self.edit.push('\n');
}
self.edit.push_str(&fmt_add_text_highlight(s));
}
self.has_changes = true
}
Chunk::Equal(s) => {
let split = s.split('\n').enumerate();
for (i, s) in split {
if i > 0 {
self.flush_changes();
}
self.orig.push_str(&fmt_rem_text(s));
self.edit.push_str(&fmt_add_text(s));
}
}
fn handle_diff(mut self, diff: Diff) -> String {
let mut prev_before_end: u32 = 0;
let mut prev_after_end: u32 = 0;

for hunk in diff.hunks() {
// Skip unchanged lines between hunks
self.orig_line +=
(hunk.before.start - prev_before_end) as usize;
self.edit_line +=
(hunk.after.start - prev_after_end) as usize;

// Write deleted lines
for del in hunk.before.clone() {
let s = self.input.interner[self.input.before[del as usize]];
self.write_rem_line(s);
}
// Write inserted lines
for ins in hunk.after.clone() {
let s = self.input.interner[self.input.after[ins as usize]];
self.write_add_line(s);
}
}

self.flush_changes();
}

fn flush_changes(&mut self) {
if self.has_changes {
self.write_line_diff();

self.orig_line += self.orig.split('\n').count();
self.edit_line += self.edit.split('\n').count();
self.has_changes = false;
} else {
self.orig_line += 1;
self.edit_line += 1;
prev_before_end = hunk.before.end;
prev_after_end = hunk.after.end;
}

self.orig.clear();
self.edit.clear();
self.output
}

fn write_line_diff(&mut self) {
let split = self.orig.split('\n').enumerate();
for (i, s) in split {
write!(
self.output,
"{:width$}{} ",
self.orig_line + i,
colors::gray(" |"),
width = self.line_number_width
)
.unwrap();
self.output.push_str(&fmt_rem());
self.output.push_str(s);
self.output.push('\n');
}
fn write_rem_line(&mut self, text: &str) {
let text = text.strip_suffix('\n').unwrap_or(text);
write!(
self.output,
"{:width$}{} ",
self.orig_line,
colors::gray(" |"),
width = self.line_number_width
)
.unwrap();
self.output.push_str(&fmt_rem());
self.output.push_str(&fmt_rem_text_highlight(text));
self.output.push('\n');
self.orig_line += 1;
}

let split = self.edit.split('\n').enumerate();
for (i, s) in split {
write!(
self.output,
"{:width$}{} ",
self.edit_line + i,
colors::gray(" |"),
width = self.line_number_width
)
.unwrap();
self.output.push_str(&fmt_add());
self.output.push_str(s);
self.output.push('\n');
}
fn write_add_line(&mut self, text: &str) {
let text = text.strip_suffix('\n').unwrap_or(text);
write!(
self.output,
"{:width$}{} ",
self.edit_line,
colors::gray(" |"),
width = self.line_number_width
)
.unwrap();
self.output.push_str(&fmt_add());
self.output.push_str(&fmt_add_text_highlight(text));
self.output.push('\n');
self.edit_line += 1;
}
}

fn fmt_add() -> String {
colors::green_bold("+").to_string()
}

fn fmt_add_text(x: &str) -> String {
colors::green(x).to_string()
}

fn fmt_add_text_highlight(x: &str) -> String {
colors::black_on_green(x).to_string()
}
Expand All @@ -165,10 +132,6 @@ fn fmt_rem() -> String {
colors::red_bold("-").to_string()
}

fn fmt_rem_text(x: &str) -> String {
colors::red(x).to_string()
}

fn fmt_rem_text_highlight(x: &str) -> String {
colors::white_on_red(x).to_string()
}
Expand Down Expand Up @@ -268,11 +231,10 @@ mod tests {
"2 | -\n",
"3 | -\n",
"4 | -\n",
"5 | -console.log(\n",
"1 | +console.log(\n",
"6 | -'Hello World'\n",
"7 | -)\n",
"2 | +\"Hello World\"\n",
"7 | -)\n3 | +);\n",
"3 | +);\n",
),
);
}
Expand All @@ -285,7 +247,6 @@ mod tests {
concat!(
"2 | -some line text test\n",
"2 | +some line text test\n",
"3 | +\n",
),
);
}
Expand Down
Loading