-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
This issue is a continuation/deeper analysis of #44443.
When using hints based autodiscover with modules there is no way to
get hints audotidscover and modules to work perfectly together.
For all the cases below we will use the Nginx module as example, run
a single Nginx container using Docker and the latest Filebeat
v9.1.5.
To run the Nginx container use the following docker-compose.yml
file:
docker-compose.yaml
version: '3.8'
services:
nginx:
image: nginx
ports:
- "9080:80"
labels:
co.elastic.logs/enabled: true
co.elastic.logs/module: nginx
# co.elastic.logs/fileset.stdout: access
# co.elastic.logs/fileset.stderr: error
configs:
- source: nginx_config
target: /etc/nginx/nginx.conf
configs:
nginx_config:
content: |
events {}
http {
access_log /dev/stdout;
error_log /dev/stderr;
server {
listen 80;
server_name localhost;
location / {
return 200 "Welcome to Nginx!\n";
add_header Content-Type text/plain;
}
location /error {
root /foo/bar/does/not/exist;
index index.html;
}
}
}This will deploy an Nginx container with two endpoints:
http://localhost:9080is always a successful requesthttp://localhost:9080/erroralways generates an error and a 404
http status code.
1. The baseline
As a baseline let's start by deploying a standalone Filebeat reading
logs from an Nginx process running on the same host using the Nginx
module. Because we're reading files directly from disk, I'll add them
here as an example.
Example files: logs.tar.gz
With the log archive extracted, run Filebeat with the following
configuration (adjust the output and path settings to your environment):
filebeat.yml
filebeat.modules:
- module: nginx
access:
enabled: true
var.paths:
- ./logs/access.log
error:
enabled: true
var.paths:
- ./logs/error.log
setup.kibana:
host: "https://localhost:5601"
ssl.verification_mode: none
username: "elastic"
password: "changeme"
output.elasticsearch:
hosts:
- https://localhost:9200
username: elastic
password: changeme
ssl.verification_mode: none
logging:
to_stderr: trueThen setup all assets in Kibana and start Filebeat:
./filebeat setup
./filebeat
Then go to Kibana, open Discover and look at the ingested logs, they
will be in the past, so make sure you're visualising logs around
2025-10-16T12:22:28.000-04:00. There should be 9 entries in total
and they have the fields correctly parsed:
Now that we have a baseline let's explore the different modes hints
based autodiscover can work and the issue with each one of them.
2. The simplest configuration possible
Running the Filestream input instead of the Log, or any other input
defined in the module configuration, brings two problems:
- The Filestream inputs are generated with the same ID, the first
starts and the others fail to start - Most, if not all, modules have their configuration written for the
Log input and many configuration keys are different between the
two inputs, which makes the Filestream input not work as expected.
The simplest configuration possible enables hints and uses all the
defaults:
filebeat.autodiscover:
providers:
- hints.enabled: true
type: dockerfilebeat.yml
filebeat.autodiscover:
providers:
- hints.enabled: true
type: docker
setup.kibana:
host: "https://localhost:5601"
ssl.verification_mode: none
username: "elastic"
password: "changeme"
output.elasticsearch:
hosts:
- https://localhost:9200
username: elastic
password: changeme
ssl.verification_mode: none
logging:
to_stderr: trueWe can see the modules being enabled:
{
"@timestamp": "2025-10-16T17:12:30.801-0400",
"ecs.version": "1.6.0",
"log.level": "info",
"log.logger": "modules",
"log.origin": {
"file.line": 135,
"file.name": "fileset/modules.go",
"function": "github.com/elastic/beats/v7/filebeat/fileset.newModuleRegistry"
},
"message": "Enabled modules/filesets: nginx (error), nginx (access)",
"service.name": "filebeat"
}
{
"@timestamp": "2025-10-16T17:12:31.161-0400",
"ecs.version": "1.6.0",
"log.level": "info",
"log.logger": "modules",
"log.origin": {
"file.line": 135,
"file.name": "fileset/modules.go",
"function": "github.com/elastic/beats/v7/filebeat/fileset.newModuleRegistry"
},
"message": "Enabled modules/filesets: nginx (error), nginx (access)",
"service.name": "filebeat"
}Each fileset will start one input by merging the default
configuration:
beats/filebeat/autodiscover/builder/hints/config.go
Lines 30 to 66 in 49b225e
| defaultCfgRaw := map[string]interface{}{ | |
| "type": "filestream", | |
| "id": "container-logs-${data.container.id}", | |
| "prospector": map[string]interface{}{ | |
| "scanner": map[string]interface{}{ | |
| "fingerprint.enabled": true, | |
| "symlinks": true, | |
| }, | |
| }, | |
| // Prevent partial ingestion when containers stop. | |
| // Kubernetes is too eager to remove the log files, so Filebeat, | |
| // sometimes, does not have enough time to ingest the whole file | |
| // before it is removed. | |
| "close.on_state_change.removed": false, | |
| "file_identity.fingerprint": nil, | |
| // Enable take over mode to migrate state from the previous | |
| // configuration version. This prevents re-ingestion of existing | |
| // files. | |
| "take_over": map[string]any{ | |
| "enabled": true, | |
| "from_ids": []string{ | |
| "kubernetes-container-logs-${data.container.id}", | |
| }, | |
| }, | |
| "parsers": []interface{}{ | |
| map[string]interface{}{ | |
| "container": map[string]interface{}{ | |
| "stream": "all", | |
| "format": "auto", | |
| }, | |
| }, | |
| }, | |
| "paths": []string{ | |
| "/var/log/containers/*-${data.container.id}.log", // Kubernetes | |
| "/var/lib/docker/containers/${data.container.id}/*-json.log", // Docker | |
| }, | |
| } |
with the module's configuration. Because the ID is the same for both
(
container-logs-${data.container.id}), the second one to be started fails:
{
"@timestamp": "2025-10-16T17:12:31.162-0400",
"ecs.version": "1.6.0",
"input.cfg": "{\n \"_fileset_name\": \"access\",\n \"_module_name\": \"nginx\",\n \"close\": {\n \"on_state_change\": {\n \"removed\": false\n }\n },\n \"exclude_files\": [\n \".gz$\"\n ],\n \"file_identity\": {\n \"fingerprint\": null\n },\n \"id\": \"container-logs-2f8726ec16a1d165262814523b490f623feb631cd605d137af2a683833605f40\",\n \"parsers\": [\n {\n \"container\": {\n \"format\": \"auto\",\n \"stream\": \"stdout\"\n }\n }\n ],\n \"path\": {\n \"config\": \"/home/tiago/sandbox/issue-foo/nginx-example/issue/beats/filebeat-9.1.5-linux-x86_64\",\n \"data\": \"/home/tiago/sandbox/issue-foo/nginx-example/issue/beats/filebeat-9.1.5-linux-x86_64/data\",\n \"home\": \"/home/tiago/sandbox/issue-foo/nginx-example/issue/beats/filebeat-9.1.5-linux-x86_64\",\n \"logs\": \"/home/tiago/sandbox/issue-foo/nginx-example/issue/beats/filebeat-9.1.5-linux-x86_64/logs\"\n },\n \"paths\": [\n \"/var/log/containers/*-2f8726ec16a1d165262814523b490f623feb631cd605d137af2a683833605f40.log\",\n \"/var/lib/docker/containers/2f8726ec16a1d165262814523b490f623feb631cd605d137af2a683833605f40/*-json.log\"\n ],\n \"pipeline\": \"filebeat-9.1.5-nginx-access-pipeline\",\n \"processors\": [\n {\n \"add_locale\": null\n },\n {\n \"add_fields\": {\n \"fields\": {\n \"ecs\": {\n \"version\": \"1.12.0\"\n }\n },\n \"target\": \"\"\n }\n }\n ],\n \"prospector\": {\n \"scanner\": {\n \"fingerprint\": {\n \"enabled\": true\n },\n \"symlinks\": true\n }\n },\n \"take_over\": {\n \"enabled\": true,\n \"from_ids\": [\n \"kubernetes-container-logs-2f8726ec16a1d165262814523b490f623feb631cd605d137af2a683833605f40\"\n ]\n },\n \"type\": \"filestream\"\n}",
"log.level": "error",
"log.logger": "input",
"log.origin": {
"file.line": 193,
"file.name": "input-logfile/manager.go",
"function": "github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*InputManager).Create"
},
"message": "filestream input ID 'container-logs-2f8726ec16a1d165262814523b490f623feb631cd605d137af2a683833605f40' is duplicated: input will NOT start",
"service.name": "filebeat"
}At the moment there is no way to make the ID to be aware of the
autodiscover variables and the module/fileset names. Therefore it is
not possible to use Filestream for hints autodiscover + modules.
3. Using the container input + setting streams manually
We can provide a custom hints default config and change the input type
to Container. Doing so allows the input to start because the Container
input uses the Log input under the hood and it is able to configure
the Log input to read the different streams from the container logs
(stdout and stderr) independently. However this brings two new
problems:
- Different instances of the Log input cannot harvester the same
file and stream, so for any module with more than 2 filesets, or
multiple filesets reading from the same stream, there will be
conflicts on how the input stores the state. - Data gets duplicated and all entries listed as from the same
dataset.
Start filebeat
sudo ./filebeat --strict.perms=false
Two different inputs/harvesters are started, notice the different harvester_id
{
"@timestamp": "2025-10-17T12:01:14.879-0400",
"ecs.version": "1.6.0",
"finished": false,
"harvester_id": "eccf44e2-9192-4558-aefb-926fcb2898b0",
"input_id": "60e76d40-ebac-4f90-a661-da2963820f1c",
"log.level": "info",
"log.logger": "input.harvester",
"log.origin": {
"file.line": 311,
"file.name": "log/harvester.go",
"function": "github.com/elastic/beats/v7/filebeat/input/log.(*Harvester).Run"
},
"message": "Harvester started for paths: [/var/lib/docker/containers/7d4e959ff844fc4379743f78820db8a50ad3ebe0cbe5520032333b53562129ba/*.log]",
"os_id": "6686906-64768",
"service.name": "filebeat",
"source_file": "/var/lib/docker/containers/7d4e959ff844fc4379743f78820db8a50ad3ebe0cbe5520032333b53562129ba/7d4e959ff844fc4379743f78820db8a50ad3ebe0cbe5520032333b53562129ba-json.log",
"state_id": "1b59052b95e61943-native::6686906-64768"
}
{
"@timestamp": "2025-10-17T12:01:14.879-0400",
"ecs.version": "1.6.0",
"finished": false,
"harvester_id": "c6975238-7570-44c0-a447-afba59df82ce",
"input_id": "2240e095-0755-46e0-b31c-19906d75b339",
"log.level": "info",
"log.logger": "input.harvester",
"log.origin": {
"file.line": 311,
"file.name": "log/harvester.go",
"function": "github.com/elastic/beats/v7/filebeat/input/log.(*Harvester).Run"
},
"message": "Harvester started for paths: [/var/lib/docker/containers/7d4e959ff844fc4379743f78820db8a50ad3ebe0cbe5520032333b53562129ba/*.log]",
"os_id": "6686906-64768",
"service.name": "filebeat",
"source_file": "/var/lib/docker/containers/7d4e959ff844fc4379743f78820db8a50ad3ebe0cbe5520032333b53562129ba/7d4e959ff844fc4379743f78820db8a50ad3ebe0cbe5520032333b53562129ba-json.log",
"state_id": "d35e05a633229937-native::6686906-64768"
}
Once Filebeat is up and running we can start the Nignx container
keeping it attached to the console to see the logs.
But before starting the container, uncomment the two commented out
labels, giving you:
labels:
co.elastic.logs/enabled: true
co.elastic.logs/module: nginx
co.elastic.logs/fileset.stdout: access
co.elastic.logs/fileset.stderr: error docker compose up
Making two requests to / and one to /error:
curl http://localhost:9080/
curl http://localhost:9080/
curl http://localhost:9080/error
We can see the logs from the Nignx container, there are a few
initialisation logs, then the entries for the access and error logs
WARN[0000] /home/tiago/sandbox/issue-foo/nginx-example/issue/beats/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
Attaching to nginx-1
nginx-1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
nginx-1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
nginx-1 | 10-listen-on-ipv6-by-default.sh: info: IPv6 listen already enabled
nginx-1 | /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
nginx-1 | /docker-entrypoint.sh: Configuration complete; ready for start up
nginx-1 | 172.25.0.1 - - [17/Oct/2025:16:01:43 +0000] "GET / HTTP/1.1" 200 18 "-" "curl/8.16.0"
nginx-1 | 172.25.0.1 - - [17/Oct/2025:16:01:44 +0000] "GET / HTTP/1.1" 200 18 "-" "curl/8.16.0"
nginx-1 | 172.25.0.1 - - [17/Oct/2025:16:02:29 +0000] "GET /error HTTP/1.1" 404 153 "-" "curl/8.16.0"
nginx-1 | 2025/10/17 16:02:29 [error] 22#22: *3 open() "/foo/bar/does/not/exist/error" failed (2: No such file o
r directory), client: 172.25.0.1, server: localhost, request: "GET /error HTTP/1.1", host: "localhost:9080"
However when looking in Kibana, we see duplicated entries and only one dataset:
Looking at the original log file we can see Nginx is correctly logging
to stdout/stderr:
{
"log": "/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh\n",
"stream": "stdout",
"time": "2025-10-17T16:01:14.014527585Z"
}
{
"log": "/docker-entrypoint.sh: Configuration complete; ready for start up\n",
"stream": "stdout",
"time": "2025-10-17T16:01:14.015718076Z"
}
{
"log": "172.25.0.1 - - [17/Oct/2025:16:01:43 +0000] \"GET / HTTP/1.1\" 200 18 \"-\" \"curl/8.16.0\"\n",
"stream": "stdout",
"time": "2025-10-17T16:01:43.122783796Z"
}
{
"log": "172.25.0.1 - - [17/Oct/2025:16:01:44 +0000] \"GET / HTTP/1.1\" 200 18 \"-\" \"curl/8.16.0\"\n",
"stream": "stdout",
"time": "2025-10-17T16:01:44.058688247Z"
}
{
"log": "2025/10/17 16:02:29 [error] 22#22: *3 open() \"/foo/bar/does/not/exist/error\" failed (2: No such file or directory), client: 172.25.0.1, server: localhost, request: \"GET /error HTTP/1.1\", host: \"localhost:9080\"\n"
,
"stream": "stderr",
"time": "2025-10-17T16:02:29.24672494Z"
}
{
"log": "172.25.0.1 - - [17/Oct/2025:16:02:29 +0000] \"GET /error HTTP/1.1\" 404 153 \"-\" \"curl/8.16.0\"\n",
"stream": "stdout",
"time": "2025-10-17T16:02:29.246738188Z"
}4. Using the container input without specifying any fileset
Using the same Filebeat configuraiton as in #3 and the original
docker-compose.yml that does not specify any fileset, there are even
more issues:
- All filesets are enabled reading from all streams (stderr and
stdout) - Because all inputs are reading from all streams, we effectively have
3 different instances of the log input sharing the state of the
same file. Which will cause issues - The data duplication is even worse, some entries have 3 duplicates,
for a total of 4 entries from a single log line.
We can see the 3 inputs being started, note the different harvester_id
{
"@timestamp": "2025-10-17T12:33:32.416-0400",
"ecs.version": "1.6.0",
"finished": false,
"harvester_id": "4c92fd5e-dce3-4cad-92d0-32834b36e6b3",
"input_id": "b2638e3e-7ee7-4a65-abaa-f0cba270a7bb",
"log.level": "info",
"log.logger": "input.harvester",
"log.origin": {
"file.line": 311,
"file.name": "log/harvester.go",
"function": "github.com/elastic/beats/v7/filebeat/input/log.(*Harvester).Run"
},
"message": "Harvester started for paths: [/var/lib/docker/containers/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78/*.log]",
"os_id": "6686901-64768",
"service.name": "filebeat",
"source_file": "/var/lib/docker/containers/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78-json.log",
"state_id": "native::6686901-64768"
}
{
"@timestamp": "2025-10-17T12:33:32.416-0400",
"ecs.version": "1.6.0",
"finished": false,
"harvester_id": "ada57fb4-985b-4207-8456-d0089207cf9b",
"input_id": "4857f463-67d5-4e20-a7d8-650cf52ebd2b",
"log.level": "info",
"log.logger": "input.harvester",
"log.origin": {
"file.line": 311,
"file.name": "log/harvester.go",
"function": "github.com/elastic/beats/v7/filebeat/input/log.(*Harvester).Run"
},
"message": "Harvester started for paths: [/var/lib/docker/containers/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78/*.log]",
"os_id": "6686901-64768",
"service.name": "filebeat",
"source_file": "/var/lib/docker/containers/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78-json.log",
"state_id": "native::6686901-64768"
}
{
"@timestamp": "2025-10-17T12:33:32.416-0400",
"ecs.version": "1.6.0",
"finished": false,
"harvester_id": "847d48da-0532-4d17-9108-41ea7eeee99f",
"input_id": "56cefe67-004d-442e-a2d2-9c92de4e4d40",
"log.level": "info",
"log.logger": "input.harvester",
"log.origin": {
"file.line": 311,
"file.name": "log/harvester.go",
"function": "github.com/elastic/beats/v7/filebeat/input/log.(*Harvester).Run"
},
"message": "Harvester started for paths: [/var/lib/docker/containers/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78/*.log]",
"os_id": "6686901-64768",
"service.name": "filebeat",
"source_file": "/var/lib/docker/containers/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78/fa5b93d8d08039eb89a7a6d4f66c00d2c6f856505744058971cbe178dcfe2a78-json.log",
"state_id": "native::6686901-64768"
}The container logs:
nginx-1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
nginx-1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
nginx-1 | 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
nginx-1 | 10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
nginx-1 | /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
nginx-1 | /docker-entrypoint.sh: Configuration complete; ready for start up
nginx-1 | 172.25.0.1 - - [17/Oct/2025:16:34:13 +0000] "GET / HTTP/1.1" 200 18 "-" "curl/8.16.0"
nginx-1 | 172.25.0.1 - - [17/Oct/2025:16:34:14 +0000] "GET / HTTP/1.1" 200 18 "-" "curl/8.16.0"
nginx-1 | 2025/10/17 16:34:16 [error] 29#29: *3 open() "/foo/bar/does/not/exist/error" failed (2: No such file o
nginx-1 | 172.25.0.1 - - [17/Oct/2025:16:34:16 +0000] "GET /error HTTP/1.1" 404 153 "-" "curl/8.16.0"
Data duplication in Kibana, also note that the nginx.error dataset
have two entries of the access log as a single event.


