-
Notifications
You must be signed in to change notification settings - Fork 5k
filebeat: make GZIP GA and add compression config
#47893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
676ad9b to
4543ad2
Compare
🔍 Preview links for changed docs |
4543ad2 to
a4509c4
Compare
a4509c4 to
9acf2f8
Compare
863acc4 to
6ce1b05
Compare
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
colleenmcginnis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions related to the changelog fragment.
changelog/fragments/1764777232-ingest-GZIP-logs-is-GA-and-enabled-by-default.yaml
Outdated
Show resolved
Hide resolved
changelog/fragments/1764777232-ingest-GZIP-logs-is-GA-and-enabled-by-default.yaml
Outdated
Show resolved
Hide resolved
|
@cmacknz @AndersonQ does enabling gzip files by default mean that previous users that use wildcards ( |
|
I got some feedback from PM (Bill) that we may not want to have this be enabled by default just to minimize any chance of user disruption. Since in general we want compatibility with filelog, we can follow it's approach here which is covered by the
So I would vote we add the same beats/filebeat/input/filestream/file.go Lines 220 to 223 in 92bde55
|
ok, makes sense.
Ok, so let me confirm the behaviour:
|
No. There is no reason for Filebeat to exit, you should only warn. You do not want to cause a data collection outage over this because Filebeat is very likely doing more data collection than just reading actively rotating gzipped logs.
Yes but please use filelog and test to confirm we have interpreted it's behaviour from its documentation correctly.
Yes the log input should not support compression.
I would just ignore this parameter and log that it's deprecated and explain what to do instead. We want to delete this parameter. It may be simpler to just delete it immediately (which will also cause it to be ignored). |
3960caa to
9427590
Compare
Promotes GZIP support in the filestream input from beta to General Availability. Deprecates the `gzip_experimental` option in favour of the new `compression` setting. Valid values: - `""`: No compression (default). - `"gzip"`: Treat all files as GZIP. - `"auto"`: Auto-detect based on magic bytes. Note: GZIP decoding requires `fingerprint` file identity for accurate offset tracking. A warning is now logged if compression is enabled with a file identity other than `fingerprint`. Unit and integration tests have been updated to reflect these changes. AI tools used: Cursor.
3075a5c to
4fc0195
Compare
|
@orestisfl, @cmacknz it's ready for review :) |
compression config
I confirmed, it behaves like that |
Enabling GZIP ingestion has no effect on the paths glob matching. It always matches all files, regardless of their format. Filestream will ingest anything, if it isn't a plain file, it'll ingest garbage, a string representation of the data on the file. For example, a gzip file would end up like: "message":"\u0013\ufffd˙4_\ufffd\ufffd\ufffd\ufffd>\ufffdy\u0008\ufffd*V\ufffdM\ufffd;=\u0002u=$\ufffd\u000e\ufffd\u0001\u0008\u0003\ufffdJ\ufffd\ufffd\ufffd#\ufffdǎ\ufffd\ufffd\u001c8\ufffd\ufffd\u0006K\ufffd!n\u000f\ufffd2[\ufffd\ufffd\ufffd4O\ufffd\r\ufffdw$͒d\ufffd\ufffd%\ufffd]{\ufffd\ufffd\ufffd^\u0011\ufffd\u001d\ufffd\ufffd\ufffd\ufffdU\u000b\u0002D\u0018d\ufffd.\ufffdNs\u0013\ufffd1b?\u000eB\u001e\ufffd\ufffd<\ufffd7\ufffdr\ufffdwyS\ufffd\t\u0001\ufffd[\ufffdA\u0004%\ufffdi\ufffd\ufffd&lmRxj\u0010\ufffd@\u001f\u0007\ufffd\u0011,\ufffdy\ufffd\ufffdQ\ufffd\ufffd&\ufffd\ufffdX>\ufffd\ufffd\ufffd <\ufffdǹ\ufffd\ufffd\ufffdw<\u001a\u0000OW$!dp\ufffd\ufffdh\ufffd\ufffd\ufffd8{\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd@]U7Pl\ufffd\ufffdz\u000fS\ufffd,H\u0017\ufffd", |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
orestisfl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some test suggestions
| Message string `json:"message"` | ||
| } | ||
| events := integration.GetEventsFromFileOutput[event](filebeat, 0, true) | ||
| require.Equal(t, len(events), 1, "expected one event") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| require.Equal(t, len(events), 1, "expected one event") | |
| require.Len(t, events, 1) |
better for debugging in case there is more than one
| { | ||
| name: "compression empty string with gzip_experimental set", | ||
| compression: "compression: \"\"\n gzip_experimental: true", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it would make this test stronger (albeit less clean) if it could prove that the correct settings (compression: auto) and the exact same steps do read the file contents
Proposed commit message
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
gzip_experimentalhas been removed. It is ignored and only a warning directing users to usecompressionis logged ifgzip_experimentalconfigured.How to test this PR locally
verify that non-fingerprint file identities and GZIP logs a warning
file_identity.native:To verify it works with compression auto:
To verify
compression: gzipdoes not ingest plain file:{"log.level":"warn","@timestamp":"2025-12-12T09:40:47.453+0100","log.logger":"input.filestream.scanner","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileScanner).GetFiles","file.name":"filestream/fswatch.go","file.line":511},"message":"cannot create a file descriptor for an ingest target \"/tmp/beats/in/log.ndjson\": failed to create gzip seeker: could not create gzip reader: gzip: invalid header","service.name":"filebeat","id":"gzip-input","ecs.version":"1.6.0"}To verify
compression: ""ingest gzip file as a plain file:Related issues