Skip to content

Data loss when writing long lines with File#print after File#set_encoding #16796

@bloovis

Description

@bloovis

Writing long lines to a file can cause characters to be dropped in some situations. Specifically, if I use File#set_encoding and set invalid to :skip, then File#print of a very long line causes bytes to dropped at 1K offsets.

I’m using Crystal 1.18.2 on Fedora Linux 43 (x86_64).

Here is a test program that demonstrates the problem: when I run it, it generates a file containing 2999 bytes, when it should be 3001 bytes. When I delete the set_encoding line, the problem goes away.

longline = "0123456789" * 300
File.open("junk", "w") do |f|
  f.set_encoding("UTF-8", invalid: :skip) # This causes the data loss.
  f.print(longline)
  f.print("\n")
end
size = File.info("junk").size
puts "file junk has #{size} bytes, should be 3001"

This problem also happens with IO::Memory. In the following test program the "|" characters are dropped.

longline = ("." * 1024 + "|") * 4
buffer = IO::Memory.new
buffer.set_encoding "UTF-8", invalid: :skip
buffer << longline

puts buffer

puts "buffer has #{buffer.bytesize} bytes, should be 4100"

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugA bug in the code. Does not apply to documentation, specs, etc.topic:stdlib:text

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions