See logstash-plugins/logstash-codec-multiline#14
Preliminary:
- the line codec is replaced by the multiline codec.
- stdin is given data larger than 32K
Fault:
- the multiline codec expects line oriented data
- stdin input reads data in 32K chunks
- it is highly unlikely that the newline characters align on the 32K boundary.
- when the last character of the chunk is not a newline, the multiline codec assumes
\npiece_of_line_in_this_side_of_32K_block is a full line and buffers it as such. The other piece of the line in the next 32K block is also treated as a line
- when the multiline codec combines these 'lines', one sees an extra newline in the middle of a natural line.
Proposal:
- use FileWatch::BufferedTokenizer to line orient the data fed to the codec and make the plain codec the default.