clean-processed-folder.py in the filebeat container has the job of removing old Log files once they've been processed.
I discovered two problems:
- Windows Event file archives (evtx) are not cleaned due to their MIME type being too generic (
application/octet-stream)
- Some Zeek logs can smell like HTML files (e.g.,
HTML document, ASCII text, with very long lines, text/html)
We need to handle these two cases.