Open
Description
Description
In read mode zipped files are currently always read completely when they already have been read fully.
This is due to a call to sincedb_collection.clear_watched_file(key)
after the zip read completes.
Thus removing the file information from the sincedb entry making it impossible for logstash to recognize the file as already having been read on re-run.
The problem can be fixed by adding a call to sincedb_collection.reading_completed(key)
before the clear_watched_file
call; as is done in the plain text handler.
The reading completed call will set the @path_in_sincedb
attribute ensuring that the entry is correctly serialized even after clearing the watched file.
Information
- Version: 4.2.4
- Operating System: Ubuntu Bionic (18.04)
- Config File:
# pipeline config
input {
file {
path => "/tmp/test.log.gz"
mode => "read"
sincedb_path => "/tmp/sincedb"
file_completed_action => "log"
file_completed_log_path => "/tmp/done.log"
exit_after_read => true
}
}
output {
stdout {}
}
- Sample Data:
/tmp/test.log
Line 1
Line 2
Line 3
Line 4
Line 5
- Steps to Reproduce:
- Zip
/tmp/test.log
gzip /tmp/test.log
- Run the pipeline the first time (this will create the sincedb file)
- Run the pipeline again and see that the zip is read again
- Inspect the sincedb file and see that inode entry for
/tmp/test.log.gz
is missing the path
- Zip
Metadata
Metadata
Assignees
Labels
No labels