-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Data loss when writing long lines with File#print after File#set_encoding #16796
Copy link
Copy link
Open
Copy link
Labels
kind:bugA bug in the code. Does not apply to documentation, specs, etc.A bug in the code. Does not apply to documentation, specs, etc.topic:stdlib:text
Description
Writing long lines to a file can cause characters to be dropped in some situations. Specifically, if I use File#set_encoding and set invalid to :skip, then File#print of a very long line causes bytes to dropped at 1K offsets.
I’m using Crystal 1.18.2 on Fedora Linux 43 (x86_64).
Here is a test program that demonstrates the problem: when I run it, it generates a file containing 2999 bytes, when it should be 3001 bytes. When I delete the set_encoding line, the problem goes away.
longline = "0123456789" * 300
File.open("junk", "w") do |f|
f.set_encoding("UTF-8", invalid: :skip) # This causes the data loss.
f.print(longline)
f.print("\n")
end
size = File.info("junk").size
puts "file junk has #{size} bytes, should be 3001"
This problem also happens with IO::Memory. In the following test program the "|" characters are dropped.
longline = ("." * 1024 + "|") * 4
buffer = IO::Memory.new
buffer.set_encoding "UTF-8", invalid: :skip
buffer << longline
puts buffer
puts "buffer has #{buffer.bytesize} bytes, should be 4100"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kind:bugA bug in the code. Does not apply to documentation, specs, etc.A bug in the code. Does not apply to documentation, specs, etc.topic:stdlib:text