- 
                Notifications
    
You must be signed in to change notification settings  - Fork 5k
 
feat: Adds file owner and group to the events' meta fields #47331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
          
 💚 CLA has been signed  | 
    
          🤖 GitHub commentsExpand to view the GitHub comments
 Just comment with: 
  | 
    
| 
           This pull request does not have a backport label. 
 To fixup this pull request, you need to add the backport labels for the needed 
  | 
    
22c3da5    to
    87d418f      
    Compare
  
    | 
           I have signed the CLA, not sure why this test fails.  | 
    
| 
           Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)  | 
    
| 
           I think this should be configurable and disabled by default, as suggested in the original issue. All of this metadata ideally would be opt in, with the existing fields enable by default and these new ones disable by default to avoid a breaking change to the metadata. At least in the scope of this PR, the new fields should be opt in. Following the conventions used by filelog with  This metadata increases the storage overhead of every log file we ingest so unless you are specifically trying to query the information it provides, it just drives up the storage cost of logs.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 
           @belimawr @cmacknz I see and agree. Before I go too far in the implementation, would you have some recommendations/no-go in where the new setting should be added ? 
 Do you think it would have a value here or should I keep it scope and make it be part of a separate PR ?  | 
    
| 
           @Hiruma31, I think you can pass booleans for NewFilemeta: 
 and add a config to filestream, as Craig suggested,  add them as fields on the  beats/filebeat/input/filestream/input.go Line 63 in 4873a73 
 they'd be read on beats/filebeat/input/filestream/input.go Line 105 in 4873a73 
 and the defaultsm, well, if they're booleans and the default is false, nothing to do. The default config comes from: beats/filebeat/input/filestream/config.go Line 132 in 21e39f4 
 @belimawr, do u agree?  | 
    
| 
           @AndersonQ Thanks for the feedback, this sounds in line with the direction I took so I pushed the commit to ease the review and clarify. I also put the PR in Draft until I get around the doc part.  | 
    
569251a    to
    b3f166f      
    Compare
  
    
          🔍 Preview links for changed docs | 
    
| 
           I updated the doc, would it be possible to include the label   | 
    
          
 This is a new feature, it should not be backported, only bug fixes are always backported. Are you in need of this feature in 8.19?  | 
    
          
 Yes, that sounds good. Thanks @AndersonQ!  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing that would be nice, but might be a bit hard to get started: adding an integration test.
We have a number of integration tests on filebeat/tests/integration. The use our integration tests framework.
You can use this one as a starting point.
You'd setup Filebeat using the file output, ingest the data and check whether the events contain the correct new fields. The framework has a function to get events from file output: GetEventsFromFileOutput.
Let me know if you need any help with the integration tests, I can help you out if needed.
| offset := int64(0) | ||
| 
               | 
          ||
| in := &FileMetaReader{msgReader(messages), path, createTestFileInfo(), "hash", offset} | ||
| in := &FileMetaReader{msgReader(messages), path, createTestFileInfo(), true, true, "hash", offset} | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a unittest to test when the user and group are set to false/not meant to be included in the metadata.
          
 That would be ideal, or at least in Filebeat 8, I'm not too concerned with the minor. Maybe the label should be  I'll address the rest of the comments tonight or tomorrow, thanks a lot for the detailed review @belimawr ! :)  | 
    
          
 We do not run 2 majors concurrently, we keep the last minor from the last major (8.19 in this case) receiving some bug and security fixes, but we do not add new features to it :/ This code will be released in the next 9.x minor after it is merged. If you really, really need it in a 8.19 build, you can build it from your fork.  | 
    
          
 @Hiruma31 The CLA was signed but the 3 of the 4 commits in this PR are not associated with your account (ie. the username is set but the email is empty).  | 
    
34216d2    to
    7a0e59b      
    Compare
  
    7a0e59b    to
    6ed0c58      
    Compare
  
    | 
           @belimawr Alright I did most things: 
  | 
    
Co-authored-by: Tiago Queiroz <[email protected]>
6ed0c58    to
    3ef9a5d      
    Compare
  
    
          
 @Hiruma31, the integration tests do not require any special setup. You should be able to run the locally with: btw, they did not compile: let us know if you have any issue running them.  | 
    
| - type: filestream | ||
| enabled: true | ||
| paths: | ||
| - /var/log/*.log | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best is always to generate the log file to be ingested. It ensures the tests are deterministic and allow for precise assertions.
You can follow the pattern below. Just generate some lines, ensure every line ends with a \n and the last line of the file is empty.
beats/filebeat/tests/integration/filestream_gzip_test.go
Lines 46 to 52 in 1d8f323
| lines := make([]string, 0, 100) | |
| var content []byte | |
| for i := range 100 { | |
| l := fmt.Sprintf("%d: a log line", i) | |
| lines = append(lines, l) | |
| content = append(content, []byte(l+"\n")...) | |
| } | 
beats/filebeat/tests/integration/filestream_gzip_test.go
Lines 139 to 143 in 1d8f323
| tempDir := filebeat.TempDir() | |
| logPath := filepath.Join(tempDir, "input.log.gz") | |
| err := os.WriteFile(logPath, tc.data, 0644) | |
| require.NoError(t, err) | 
Then, once the file is saved, you can add it to your config
| - /var/log/*.log | |
| - %s | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AndersonQ's comment.
You can also use integration.WriteLogFile if you don't care about the content. Each line generated by this function is 50 bytes long.
| Message string `json:"message"` | ||
| } | ||
| 
               | 
          ||
| evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using the file output, you need to configure it and ensure it's as expected by GetEventsFromFileOutput
// GetEventsFromFileOutput reads all events from file output. If n > 0,
// then it reads up to n events. It assumes the filename
// for the output is 'output' and 'path' is set to the TempDir.
// If waitForFile is true, it will GetEventsFromFileOutput wait up to 45
// seconds for the file to appear.
output.file:
  path: %s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also need to set the filename as it defaults to filebeat and the function expects output. Here is an example of what to configure to use the file output with debug logging:
path.home: %s
queue.mem:
  flush.timeout: 0
output:
  file:
    path: ${path.home}
    filename: "output"
    rotate_on_startup: false
logging:
  level: debug
  selectors:
    - "*"And the code using it to configure Filebeat:
beats/filebeat/tests/integration/journald_test.go
Lines 50 to 55 in be9871f
| yamlCfg := fmt.Sprintf(journaldInputCfg, syslogID, filebeat.TempDir()) | |
| generateJournaldLogs(t, syslogID, 3, 100) | |
| filebeat.WriteConfigFile(yamlCfg) | |
| filebeat.Start() | 
| type evt struct { | ||
| Message string `json:"message"` | ||
| } | ||
| 
               | 
          ||
| evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false) | ||
| for i, e := range evts { | ||
| owner, err := e.Fields.GetValue("log.file.owner") | ||
| require.NoError(t, err, "event %d: could not get owner field", i) | ||
| require.NotEmpty(t, owner, "event %d: owner field is empty", i) | ||
| 
               | 
          ||
| group, err := e.Fields.GetValue("log.file.group") | ||
| require.NoError(t, err, "event %d: could not get group field", i) | ||
| require.NotEmpty(t, group, "event %d: group field is empty", i) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to use it like that:
| type evt struct { | |
| Message string `json:"message"` | |
| } | |
| evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false) | |
| for i, e := range evts { | |
| owner, err := e.Fields.GetValue("log.file.owner") | |
| require.NoError(t, err, "event %d: could not get owner field", i) | |
| require.NotEmpty(t, owner, "event %d: owner field is empty", i) | |
| group, err := e.Fields.GetValue("log.file.group") | |
| require.NoError(t, err, "event %d: could not get group field", i) | |
| require.NotEmpty(t, group, "event %d: group field is empty", i) | |
| } | |
| type evt struct { | |
| LogFileOwner string `json:"log.file.owner"` | |
| LogFileGroup string `json:"log.file.group"` | |
| } | |
| evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false) | |
| for i, e := range evts { | |
| assert.NotEmpty(t, e.LogFileOwner, "event %d: owner field is empty", i) | |
| assert.NotEmpty(t, e.LogFileGroup, "event %d: group field is empty", i) | |
| } | 
also, as the test is creating the file, you should be able to assert the owner and group to the user running the test instead of just checking it's not empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AndersonQ, it's better to read the group and user from the file an assert they were correctly read by Filebeat instead of just checking if the field is not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is looking pretty good @Hiruma31! There are still a few things to improve on the tests.
Let me know if you get stuck/need more help.
| - type: filestream | ||
| enabled: true | ||
| paths: | ||
| - /var/log/*.log | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AndersonQ's comment.
You can also use integration.WriteLogFile if you don't care about the content. Each line generated by this function is 50 bytes long.
| Message string `json:"message"` | ||
| } | ||
| 
               | 
          ||
| evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also need to set the filename as it defaults to filebeat and the function expects output. Here is an example of what to configure to use the file output with debug logging:
path.home: %s
queue.mem:
  flush.timeout: 0
output:
  file:
    path: ${path.home}
    filename: "output"
    rotate_on_startup: false
logging:
  level: debug
  selectors:
    - "*"And the code using it to configure Filebeat:
beats/filebeat/tests/integration/journald_test.go
Lines 50 to 55 in be9871f
| yamlCfg := fmt.Sprintf(journaldInputCfg, syslogID, filebeat.TempDir()) | |
| generateJournaldLogs(t, syslogID, 3, 100) | |
| filebeat.WriteConfigFile(yamlCfg) | |
| filebeat.Start() | 
| type evt struct { | ||
| Message string `json:"message"` | ||
| } | ||
| 
               | 
          ||
| evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false) | ||
| for i, e := range evts { | ||
| owner, err := e.Fields.GetValue("log.file.owner") | ||
| require.NoError(t, err, "event %d: could not get owner field", i) | ||
| require.NotEmpty(t, owner, "event %d: owner field is empty", i) | ||
| 
               | 
          ||
| group, err := e.Fields.GetValue("log.file.group") | ||
| require.NoError(t, err, "event %d: could not get group field", i) | ||
| require.NotEmpty(t, group, "event %d: group field is empty", i) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AndersonQ, it's better to read the group and user from the file an assert they were correctly read by Filebeat instead of just checking if the field is not empty.
| } | ||
| } | ||
| 
               | 
          ||
| func TestMetaFieldsNew(t *testing.T) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is not very descriptive. What about TestMetaFieldsOwnerAndGroup?
| func TestMetaFieldsNew(t *testing.T) { | |
| func TestMetaFieldsOwnerAndGroup(t *testing.T) { | 
You could even leave the previous test, TestMetaFields with the default values (false) and only have this one enable/test owner and group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad that's actually an unsaved typo. I'll use your suggestion.
I reworked everything following your suggestion: it is now reading the suer and group dynamically to assert the value, I used integration.WriteLogFile and fixed the configuration. The only thing I didn't do is create a testdata file for the configuration to remain consistent with the rest of the tests.
| expectedFields := mapstr.M{} | ||
| if len(msg.Content) != 0 { | ||
| expectedFields = mapstr.M{ | ||
| "log": mapstr.M{ | ||
| "file": mapstr.M{ | ||
| "path": path, | ||
| "fingerprint": "hash", | ||
| }, | ||
| "offset": offset, | ||
| }, | ||
| } | ||
| in.checkFieldsMethod(t, expectedFields, msg.Fields) | ||
| } else { | ||
| require.Equal(t, expectedFields, msg.Fields) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here on why you're also testing the case where the fields are empty. I get you were inspired by the test before it, but because this is only testing the features added by this PR we can simplify it:
- Remove the empty message from 
message - Remove the 
len(msg.Content) != 0statement. 
| expectedFields := mapstr.M{} | |
| if len(msg.Content) != 0 { | |
| expectedFields = mapstr.M{ | |
| "log": mapstr.M{ | |
| "file": mapstr.M{ | |
| "path": path, | |
| "fingerprint": "hash", | |
| }, | |
| "offset": offset, | |
| }, | |
| } | |
| in.checkFieldsMethod(t, expectedFields, msg.Fields) | |
| } else { | |
| require.Equal(t, expectedFields, msg.Fields) | |
| } | |
| expectedFields := mapstr.M{ | |
| "log": mapstr.M{ | |
| "file": mapstr.M{ | |
| "path": path, | |
| "fingerprint": "hash", | |
| }, | |
| "offset": offset, | |
| }, | |
| } | |
| in.checkFieldsMethod(t, expectedFields, msg.Fields) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested, I actually simplified the test by just dealing with the enabled situation.
| } | ||
| } | ||
| 
               | 
          ||
| func TestFilestreamHasOwnerAndGroup(t *testing.T) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another important thing, the tests on this file also run on Windows, but this feature is not supported on Windows.
My recommendation is to put it on a separate file and add the !windows build flag:
//go:build integration && !windowsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the integration test to a filestream_other_test.go file, following the convention on other for windowns/others
…le_owner_group_name
3ef9a5d    to
    f00f458      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some fixes that are still required.
Did you have any issue run the test locally? If you had, you'd have caught the issue I found.
Once you fix it, try to run it. As I said, they do not require any special setup, you only need mage to compile the test binary
| cfg := fmt.Sprintf(` | ||
| filebeat.inputs: | ||
| - type: filestream | ||
| enabled: true | ||
| paths: | ||
| - %s | ||
| include_file_owner_name: true | ||
| include_file_owner_group_name: true | ||
| logging: | ||
| level: debug | ||
| metrics: | ||
| enabled: false | ||
| output: | ||
| file: | ||
| path: ${path.home} | ||
| filename: "output" | ||
| rotate_on_startup: false | ||
| `, logFilePath) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your yaml is invalid, there is an indentation issue.
| cfg := fmt.Sprintf(` | |
| filebeat.inputs: | |
| - type: filestream | |
| enabled: true | |
| paths: | |
| - %s | |
| include_file_owner_name: true | |
| include_file_owner_group_name: true | |
| logging: | |
| level: debug | |
| metrics: | |
| enabled: false | |
| output: | |
| file: | |
| path: ${path.home} | |
| filename: "output" | |
| rotate_on_startup: false | |
| `, logFilePath) | |
| cfg := fmt.Sprintf(` | |
| filebeat.inputs: | |
| - type: filestream | |
| enabled: true | |
| paths: | |
| - %s | |
| include_file_owner_name: true | |
| include_file_owner_group_name: true | |
| logging: | |
| level: debug | |
| metrics: | |
| enabled: false | |
| output: | |
| file: | |
| path: ${path.home} | |
| filename: output | |
| rotate_on_startup: false | |
| `, logFilePath) | 
| require.Equal(t, e.LogFileOwner, logFileOwner) | ||
| require.Equal(t, e.LogFileGroup, logFileGroup) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 issues here:
logFileOwnerins't the same type ase.LogFileOwner- expected and actual are swapped. Getting the right helps if the tests fails to have the correct error message
 - better to use 
assert.Equal. That way all assertions will run and all failures are reported at the end of the test instead of aborting the test on the test failure. 
| tempDir := filebeat.TempDir() | ||
| logFilePath := filepath.Join(tempDir, "input.log") | ||
| 
               | 
          ||
| integration.WriteLogFile(t, logFilePath, 5, false) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only 5 lines will generate a file with only 250 bytes, which is not large enough to be ingested by Filestream with the default file identity from 9.0.
You need to update it to at least 25 lines:
| integration.WriteLogFile(t, logFilePath, 5, false) | |
| integration.WriteLogFile(t, logFilePath, 25, false) | 
| cfg := fmt.Sprintf(` | ||
| filebeat.inputs: | ||
| - type: filestream | ||
| enabled: true | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enabled: true is redundant, you can remove it.
| enabled: true | 
| #### `include_file_owner_name` [filestream-input-include_file_owner_name] | ||
| 
               | 
          ||
| Includes the log file owner to `log.file` metadata. | ||
| This option is not supported on Windows. | ||
| 
               | 
          ||
| 
               | 
          ||
| #### `include_file_owner_group_name` [filestream-input-include_file_owner_group_name] | ||
| 
               | 
          ||
| Includes the log file group to `log.file` metadata. | ||
| This option is not supported on Windows. | ||
| 
               | 
          
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use cumulative docs now, this means there is a single page/version of the documentation to all 9.x released.
So each new feature needs to be tagged with the correct version, for that you need to add the applies_to badge:
```yaml {applies_to}
stack: ga 9.3
```
Here are the docs explaining them: https://elastic.github.io/docs-builder/syntax/applies/#section-level
Sorry for not catching that earlier 🙈
Proposed commit message
Checklist
stresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
No disruption, only a slightly larger event payload. Error catching is handled the exact same way as other fields.
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs