Skip to content

Conversation

@Hiruma31
Copy link

@Hiruma31 Hiruma31 commented Oct 25, 2025

Proposed commit message

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

No disruption, only a slightly larger event payload. Error catching is handled the exact same way as other fields.

Author's Checklist

  • [ ]

How to test this PR locally

  • Build this version
  • Runs the generated Filebeat on a Linux host and with a filestream input
  • Observe the event generated in the output, it will include user and group fields
{
[...]
  "log": {
    "file": {
      "group": "mgroup",
      "path": "/tmp/test.log",
      "device_id": "426879",
      "inode": "1292",
      "owner": "myuser"
    }
  }
[...]
}

Related issues

Use cases

Screenshots

Logs

@Hiruma31 Hiruma31 requested a review from a team as a code owner October 25, 2025 04:36
@Hiruma31 Hiruma31 requested review from belimawr and mauri870 October 25, 2025 04:36
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 25, 2025
@cla-checker-service
Copy link

cla-checker-service bot commented Oct 25, 2025

💚 CLA has been signed

@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Oct 25, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @Hiruma31? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@Hiruma31 Hiruma31 force-pushed the feat/add-file-owner-and-group branch from 22c3da5 to 87d418f Compare October 25, 2025 04:48
@Hiruma31
Copy link
Author

I have signed the CLA, not sure why this test fails.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 26, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 26, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz
Copy link
Member

cmacknz commented Oct 27, 2025

I think this should be configurable and disabled by default, as suggested in the original issue.

All of this metadata ideally would be opt in, with the existing fields enable by default and these new ones disable by default to avoid a breaking change to the metadata. At least in the scope of this PR, the new fields should be opt in. Following the conventions used by filelog with include_file_owner_name and include_file_owner_group_name seem reasonable to me as names.

This metadata increases the storage overhead of every log file we ingest so unless you are specifically trying to query the information it provides, it just drives up the storage cost of logs.

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks good, thanks for the contribuition @Hiruma31 !

However I agree with @cmacknz, this should be disabled by default.

@Hiruma31
Copy link
Author

@belimawr @cmacknz I see and agree. Before I go too far in the implementation, would you have some recommendations/no-go in where the new setting should be added ?
Right now since the info is pulled from the most suited place imho, I'm contemplating passing 2 booleans through the filestream input all the way to the readfile setFileSystemMetadata , but there might be a better way.
I have a working prototype for Linux I need to tidy up, but since the function is also used for Windows I am considering adding a commit on top with 2 things:

  • Support for SID resolution for WIndows files
  • A 3rd boolean defaulting to true to toggle the human-readable resolution in both

Do you think it would have a value here or should I keep it scope and make it be part of a separate PR ?

@AndersonQ
Copy link
Member

@Hiruma31, I think you can pass booleans for NewFilemeta:

func NewFilemeta(r reader.Reader, path string, fi file.ExtendedFileInfo, fingerprint string, offset int64) reader.Reader {

and add a config to filestream, as Craig suggested, include_file_owner_name and include_file_owner_group_name.

add them as fields on the filestream struct:

type filestream struct {

they'd be read on

func configure(cfg *conf.C, log *logp.Logger) (loginp.Prospector, loginp.Harvester, error) {

and the defaultsm, well, if they're booleans and the default is false, nothing to do. The default config comes from:

func defaultConfig() config {

@belimawr, do u agree?

@Hiruma31 Hiruma31 marked this pull request as draft October 28, 2025 11:39
@Hiruma31
Copy link
Author

@AndersonQ Thanks for the feedback, this sounds in line with the direction I took so I pushed the commit to ease the review and clarify. I also put the PR in Draft until I get around the doc part.

@Hiruma31 Hiruma31 force-pushed the feat/add-file-owner-and-group branch 2 times, most recently from 569251a to b3f166f Compare October 29, 2025 01:37
@Hiruma31 Hiruma31 marked this pull request as ready for review October 29, 2025 02:00
@Hiruma31 Hiruma31 requested a review from a team as a code owner October 29, 2025 02:00
@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2025

🔍 Preview links for changed docs

@Hiruma31
Copy link
Author

Hiruma31 commented Oct 29, 2025

I updated the doc, would it be possible to include the label backport-8.19 too please ?

@belimawr
Copy link
Contributor

I updated the doc, would it be possible to include the label backport-8.19 too please ?

This is a new feature, it should not be backported, only bug fixes are always backported.

Are you in need of this feature in 8.19?

@belimawr
Copy link
Contributor

@belimawr, do u agree?

Yes, that sounds good. Thanks @AndersonQ!

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing that would be nice, but might be a bit hard to get started: adding an integration test.

We have a number of integration tests on filebeat/tests/integration. The use our integration tests framework.

You can use this one as a starting point.

You'd setup Filebeat using the file output, ingest the data and check whether the events contain the correct new fields. The framework has a function to get events from file output: GetEventsFromFileOutput.

Let me know if you need any help with the integration tests, I can help you out if needed.

offset := int64(0)

in := &FileMetaReader{msgReader(messages), path, createTestFileInfo(), "hash", offset}
in := &FileMetaReader{msgReader(messages), path, createTestFileInfo(), true, true, "hash", offset}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a unittest to test when the user and group are set to false/not meant to be included in the metadata.

@Hiruma31
Copy link
Author

This is a new feature, it should not be backported, only bug fixes are always backported.

Are you in need of this feature in 8.19?

That would be ideal, or at least in Filebeat 8, I'm not too concerned with the minor. Maybe the label should be backport-active-8 ? I am not sure if you run 2 major concurrently.

I'll address the rest of the comments tonight or tomorrow, thanks a lot for the detailed review @belimawr ! :)

@belimawr
Copy link
Contributor

That would be ideal, or at least in Filebeat 8, I'm not too concerned with the minor. Maybe the label should be backport-active-8 ? I am not sure if you run 2 major concurrently.

We do not run 2 majors concurrently, we keep the last minor from the last major (8.19 in this case) receiving some bug and security fixes, but we do not add new features to it :/

This code will be released in the next 9.x minor after it is merged.

If you really, really need it in a 8.19 build, you can build it from your fork.

@kruskall
Copy link
Member

kruskall commented Oct 29, 2025

I have signed the CLA, not sure why this test fails.

@Hiruma31 The CLA was signed but the 3 of the 4 commits in this PR are not associated with your account (ie. the username is set but the email is empty).

@Hiruma31
Copy link
Author

@belimawr Alright I did most things:

  • Updated the commits to unify and get the CLA check to pass
  • Ran the format + doc generation as it should
  • Used false for the defaults
  • Updated the unit tests to include one for this
  • Attempted to create an integration test (a bit blindly, I cannot run them locally so I'm counting on the CI)

@AndersonQ
Copy link
Member

  • Attempted to create an integration test (a bit blindly, I cannot run them locally so I'm counting on the CI)

@Hiruma31, the integration tests do not require any special setup. You should be able to run the locally with:

cd filebeat
mage buildSystemTestBinary
go test -v -tags integration -run TestFilestreamHasOwnerAndGroup_timeout ./tests/integration

btw, they did not compile:

❯ go test -v -tags integration -run TestFilestreamHasOwnerAndGroup_timeout ./tests/integration
# github.com/elastic/beats/v7/filebeat/tests/integration [github.com/elastic/beats/v7/filebeat/tests/integration.test]
tests/integration/filestream_test.go:715:19: e.Fields undefined (type evt has no field or method Fields)
tests/integration/filestream_test.go:719:19: e.Fields undefined (type evt has no field or method Fields)
FAIL	github.com/elastic/beats/v7/filebeat/tests/integration [build failed]
FAIL

let us know if you have any issue running them.

- type: filestream
enabled: true
paths:
- /var/log/*.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best is always to generate the log file to be ingested. It ensures the tests are deterministic and allow for precise assertions.
You can follow the pattern below. Just generate some lines, ensure every line ends with a \n and the last line of the file is empty.

lines := make([]string, 0, 100)
var content []byte
for i := range 100 {
l := fmt.Sprintf("%d: a log line", i)
lines = append(lines, l)
content = append(content, []byte(l+"\n")...)
}

tempDir := filebeat.TempDir()
logPath := filepath.Join(tempDir, "input.log.gz")
err := os.WriteFile(logPath, tc.data, 0644)
require.NoError(t, err)

Then, once the file is saved, you can add it to your config

Suggested change
- /var/log/*.log
- %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AndersonQ's comment.

You can also use integration.WriteLogFile if you don't care about the content. Each line generated by this function is 50 bytes long.

Message string `json:"message"`
}

evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using the file output, you need to configure it and ensure it's as expected by GetEventsFromFileOutput

// GetEventsFromFileOutput reads all events from file output. If n > 0,
// then it reads up to n events. It assumes the filename
// for the output is 'output' and 'path' is set to the TempDir.
// If waitForFile is true, it will GetEventsFromFileOutput wait up to 45
// seconds for the file to appear.

output.file:
  path: %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to set the filename as it defaults to filebeat and the function expects output. Here is an example of what to configure to use the file output with debug logging:

path.home: %s

queue.mem:
  flush.timeout: 0

output:
  file:
    path: ${path.home}
    filename: "output"
    rotate_on_startup: false

logging:
  level: debug
  selectors:
    - "*"

And the code using it to configure Filebeat:

yamlCfg := fmt.Sprintf(journaldInputCfg, syslogID, filebeat.TempDir())
generateJournaldLogs(t, syslogID, 3, 100)
filebeat.WriteConfigFile(yamlCfg)
filebeat.Start()

Comment on lines 709 to 722
type evt struct {
Message string `json:"message"`
}

evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false)
for i, e := range evts {
owner, err := e.Fields.GetValue("log.file.owner")
require.NoError(t, err, "event %d: could not get owner field", i)
require.NotEmpty(t, owner, "event %d: owner field is empty", i)

group, err := e.Fields.GetValue("log.file.group")
require.NoError(t, err, "event %d: could not get group field", i)
require.NotEmpty(t, group, "event %d: group field is empty", i)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to use it like that:

Suggested change
type evt struct {
Message string `json:"message"`
}
evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false)
for i, e := range evts {
owner, err := e.Fields.GetValue("log.file.owner")
require.NoError(t, err, "event %d: could not get owner field", i)
require.NotEmpty(t, owner, "event %d: owner field is empty", i)
group, err := e.Fields.GetValue("log.file.group")
require.NoError(t, err, "event %d: could not get group field", i)
require.NotEmpty(t, group, "event %d: group field is empty", i)
}
type evt struct {
LogFileOwner string `json:"log.file.owner"`
LogFileGroup string `json:"log.file.group"`
}
evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false)
for i, e := range evts {
assert.NotEmpty(t, e.LogFileOwner, "event %d: owner field is empty", i)
assert.NotEmpty(t, e.LogFileGroup, "event %d: group field is empty", i)
}

also, as the test is creating the file, you should be able to assert the owner and group to the user running the test instead of just checking it's not empty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AndersonQ, it's better to read the group and user from the file an assert they were correctly read by Filebeat instead of just checking if the field is not empty.

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is looking pretty good @Hiruma31! There are still a few things to improve on the tests.

Let me know if you get stuck/need more help.

- type: filestream
enabled: true
paths:
- /var/log/*.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AndersonQ's comment.

You can also use integration.WriteLogFile if you don't care about the content. Each line generated by this function is 50 bytes long.

Message string `json:"message"`
}

evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to set the filename as it defaults to filebeat and the function expects output. Here is an example of what to configure to use the file output with debug logging:

path.home: %s

queue.mem:
  flush.timeout: 0

output:
  file:
    path: ${path.home}
    filename: "output"
    rotate_on_startup: false

logging:
  level: debug
  selectors:
    - "*"

And the code using it to configure Filebeat:

yamlCfg := fmt.Sprintf(journaldInputCfg, syslogID, filebeat.TempDir())
generateJournaldLogs(t, syslogID, 3, 100)
filebeat.WriteConfigFile(yamlCfg)
filebeat.Start()

Comment on lines 709 to 722
type evt struct {
Message string `json:"message"`
}

evts := integration.GetEventsFromFileOutput[evt](filebeat, 5, false)
for i, e := range evts {
owner, err := e.Fields.GetValue("log.file.owner")
require.NoError(t, err, "event %d: could not get owner field", i)
require.NotEmpty(t, owner, "event %d: owner field is empty", i)

group, err := e.Fields.GetValue("log.file.group")
require.NoError(t, err, "event %d: could not get group field", i)
require.NotEmpty(t, group, "event %d: group field is empty", i)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AndersonQ, it's better to read the group and user from the file an assert they were correctly read by Filebeat instead of just checking if the field is not empty.

}
}

func TestMetaFieldsNew(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not very descriptive. What about TestMetaFieldsOwnerAndGroup?

Suggested change
func TestMetaFieldsNew(t *testing.T) {
func TestMetaFieldsOwnerAndGroup(t *testing.T) {

You could even leave the previous test, TestMetaFields with the default values (false) and only have this one enable/test owner and group.

Copy link
Author

@Hiruma31 Hiruma31 Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad that's actually an unsaved typo. I'll use your suggestion.
I reworked everything following your suggestion: it is now reading the suer and group dynamically to assert the value, I used integration.WriteLogFile and fixed the configuration. The only thing I didn't do is create a testdata file for the configuration to remain consistent with the rest of the tests.

Comment on lines 145 to 159
expectedFields := mapstr.M{}
if len(msg.Content) != 0 {
expectedFields = mapstr.M{
"log": mapstr.M{
"file": mapstr.M{
"path": path,
"fingerprint": "hash",
},
"offset": offset,
},
}
in.checkFieldsMethod(t, expectedFields, msg.Fields)
} else {
require.Equal(t, expectedFields, msg.Fields)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here on why you're also testing the case where the fields are empty. I get you were inspired by the test before it, but because this is only testing the features added by this PR we can simplify it:

  1. Remove the empty message from message
  2. Remove the len(msg.Content) != 0 statement.
Suggested change
expectedFields := mapstr.M{}
if len(msg.Content) != 0 {
expectedFields = mapstr.M{
"log": mapstr.M{
"file": mapstr.M{
"path": path,
"fingerprint": "hash",
},
"offset": offset,
},
}
in.checkFieldsMethod(t, expectedFields, msg.Fields)
} else {
require.Equal(t, expectedFields, msg.Fields)
}
expectedFields := mapstr.M{
"log": mapstr.M{
"file": mapstr.M{
"path": path,
"fingerprint": "hash",
},
"offset": offset,
},
}
in.checkFieldsMethod(t, expectedFields, msg.Fields)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, I actually simplified the test by just dealing with the enabled situation.

}
}

func TestFilestreamHasOwnerAndGroup(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another important thing, the tests on this file also run on Windows, but this feature is not supported on Windows.

My recommendation is to put it on a separate file and add the !windows build flag:

//go:build integration && !windows

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the integration test to a filestream_other_test.go file, following the convention on other for windowns/others

Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some fixes that are still required.
Did you have any issue run the test locally? If you had, you'd have caught the issue I found.
Once you fix it, try to run it. As I said, they do not require any special setup, you only need mage to compile the test binary

Comment on lines +48 to +67
cfg := fmt.Sprintf(`
filebeat.inputs:
- type: filestream
enabled: true
paths:
- %s
include_file_owner_name: true
include_file_owner_group_name: true
logging:
level: debug
metrics:
enabled: false
output:
file:
path: ${path.home}
filename: "output"
rotate_on_startup: false
`, logFilePath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your yaml is invalid, there is an indentation issue.

Suggested change
cfg := fmt.Sprintf(`
filebeat.inputs:
- type: filestream
enabled: true
paths:
- %s
include_file_owner_name: true
include_file_owner_group_name: true
logging:
level: debug
metrics:
enabled: false
output:
file:
path: ${path.home}
filename: "output"
rotate_on_startup: false
`, logFilePath)
cfg := fmt.Sprintf(`
filebeat.inputs:
- type: filestream
enabled: true
paths:
- %s
include_file_owner_name: true
include_file_owner_group_name: true
logging:
level: debug
metrics:
enabled: false
output:
file:
path: ${path.home}
filename: output
rotate_on_startup: false
`, logFilePath)

Comment on lines +97 to +98
require.Equal(t, e.LogFileOwner, logFileOwner)
require.Equal(t, e.LogFileGroup, logFileGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 issues here:

  • logFileOwner ins't the same type as e.LogFileOwner
  • expected and actual are swapped. Getting the right helps if the tests fails to have the correct error message
  • better to use assert.Equal. That way all assertions will run and all failures are reported at the end of the test instead of aborting the test on the test failure.

tempDir := filebeat.TempDir()
logFilePath := filepath.Join(tempDir, "input.log")

integration.WriteLogFile(t, logFilePath, 5, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 5 lines will generate a file with only 250 bytes, which is not large enough to be ingested by Filestream with the default file identity from 9.0.

You need to update it to at least 25 lines:

Suggested change
integration.WriteLogFile(t, logFilePath, 5, false)
integration.WriteLogFile(t, logFilePath, 25, false)

cfg := fmt.Sprintf(`
filebeat.inputs:
- type: filestream
enabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enabled: true is redundant, you can remove it.

Suggested change
enabled: true

Comment on lines +219 to +229
#### `include_file_owner_name` [filestream-input-include_file_owner_name]

Includes the log file owner to `log.file` metadata.
This option is not supported on Windows.


#### `include_file_owner_group_name` [filestream-input-include_file_owner_group_name]

Includes the log file group to `log.file` metadata.
This option is not supported on Windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use cumulative docs now, this means there is a single page/version of the documentation to all 9.x released.

So each new feature needs to be tagged with the correct version, for that you need to add the applies_to badge:

```yaml {applies_to}
stack: ga 9.3
```

Here are the docs explaining them: https://elastic.github.io/docs-builder/syntax/applies/#section-level

Sorry for not catching that earlier 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to include log file owner and group with a collected log entry

7 participants