Improve ldiff binary files detection

Annih · halostatue · commit bc14f1d55b30 · 2025-01-28T23:24:28.000-05:00
The former code was performing scan on the first 4K of each file to see
if one of them has a '\0' char in it and consider it as a binary file.

This commit does not change this heuristic just the implementation.
Instead of using the scan method with a regexp, use a simple include?.

This not only fix compatibility issues with UTF8 escape sequences, but
also the performance:
  1. it does not leverage a Regexp system.
  2. it stops at first occurence worst case is O(n).
  3. it does not store much.

Also instead of using .empty? which would signal a non-binary file, the
call to include? invert the boolean test.
IMHO it is clearer.
Note: this could have been achieved simply by replacing .empty by .any?
but the other improvements listed above motivated the change.
diff --git a/lib/diff/lcs/ldiff.rb b/lib/diff/lcs/ldiff.rb
@@ -104,9 +104,9 @@ def run(args, _input = $stdin, output = $stdout, error = $stderr) # :nodoc:
 
     # Test binary status
     if @binary.nil?
-      old_txt = data_old[0, 4096].scan(/\0/).empty?
-      new_txt = data_new[0, 4096].scan(/\0/).empty?
-      @binary = !old_txt || !new_txt
+      old_bin = data_old[0, 4096].include?("\0")
+      new_bin = data_new[0, 4096].include?("\0")
+      @binary = old_bin || new_bin
     end
 
     unless @binary