Skip to content

Conversation

@AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Oct 29, 2025

Proposed commit message

filebeat/docs: add how to ingest k8s rotated logs

Update the existing docs and configuration examples to explain how
to ingest Kubernetes rotated logs, including GZIP-compressed logs.

Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

When enabling the ingestion of rotated logs on an existing deployment, it cause a one-time re-ingestion of the logs.

Author's Checklist


How to test this PR locally

Objective

To verify the correctness of Filebeat configurations for ingesting rotated Kubernetes logs, including gzip-compressed files. This guide tests two primary collection methods: static inputs (without autodiscover) and dynamic inputs (with autodiscover).

Background

Filebeat supports two main methods for collecting Kubernetes logs. A key difference between them involves metadata enrichment, specifically a limitation of the add_kubernetes_metadata processor, which is required when not using autodiscover.

This processor's logs_path matcher infers metadata by parsing the log file's path. The level of detail available depends entirely on the path structure:

  • /var/log/containers/*.log: These paths (which are symlinks) contain the container ID in their filenames. The processor uses this ID to look up all metadata, including pod and container details (e.g., kubernetes.container.name).
  • /var/log/pods/*/*/*.log*: These paths for rotated logs only contain the Pod UID. The processor can use this to find pod-level metadata (e.g., kubernetes.pod.name), but it cannot determine which specific container in that pod the log belongs to.

Therefore, when using static inputs, only logs from /var/log/containers/ will receive full container-level metadata.

The autodiscover method does not have this limitation and adds container-level metadata to all logs, including rotated ones.

Prerequisites

  1. Elastic Stack: An available cluster to receive logs.
  2. Kind Cluster: A local Kubernetes cluster: kind create cluster
  3. Useful Elasticsearch Queries:
  • Get number of logs for flog per log path:
POST /.ds-filebeat-*/_search
{
  "size": 0,
  "query": {
    "wildcard": {
      "log.file.path": {
        "value": "/var/*flog*"
      }
    }
  },
  "aggs": {
    "files_count": {
      "terms": {
        "field": "log.file.path"
      }
    }
  }
}
  • get the total number flog logs:
POST /.ds-filebeat-*/_search
{
  "size": 0,
  "query": {
    "wildcard": {
      "log.file.path": {
        "value": "/var/*flog*"
      }
    }
  },
  "aggs": {
    "total_documents": {
      "value_count": {
        "field": "log.file.path" 
      }
    }
  }
}
  • delete all logs ingested by filebeat by query:
POST /.ds-filebeat-*/_delete_by_query
{"query": {"match_all": {}}}
  • delete all logs ingested by filebeat by deleting the datastream:
DELETE _data_stream/filebeat-9.2.0
  1. Useful commands:
  • exec into the k8s node: docker exec -it kind-control-plane bash
  • check the number of logs for flog: wc -l /var/log/pods/default*/*/0.log
  1. flog is configured to generate logs in a rate that worked on my machine.
    You might need to adjust the -d parameter if your machine is significantly faster or slower.

Test Case 1: Static Inputs (Without Autodiscover)

Fresh Deployment

This test validates log ingestion on a new Filebeat deployment configured to read both active and rotated logs from the start.

  1. Generate Logs: Create a Kubernetes Job to run flog and wait for it to complete.
flog.yaml:
apiVersion: batch/v1
kind: Job
metadata:
   name: flog-log-generator
spec:
   template:
      spec:
         containers:
            - name: flog
              image: mingrammer/flog
               #          too small "-d" won't give kubelet time to rotate the files
              args: ["-t", "stdout", "-f", "json", "-n", "185000", "-d", "250us"]
         restartPolicy: Never
   backoffLimit: 4
❯ kubectl get pods 
NAME                       READY   STATUS      RESTARTS   AGE
flog-log-generator-wbgz7   0/1     Completed   0          2m17s
  1. Deploy Filebeat: Deploy Filebeat using the filebeat-kubernetes.yaml manifest, which contains both filestream inputs (one for active logs, one for rotated).
filebeat-kubernetes.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
   name: filebeat-config
   namespace: kube-system
   labels:
      k8s-app: filebeat
data:
   filebeat.yml: |-
      ## To enable collection of rotated pod logs, replace the above `filebeat.inputs` configuration with this.
      ## WARNING:
      ##    - enabling rotated pod logs ingestion might cause data re-ingestion, refer to the docs for details: https://www.elastic.co/docs/reference/beats/filebeat/running-on-kubernetes#_kubernetes_deploy_manifests
      ##    - container metadata isn't available when collecting logs from /var/log/pods/. Refer to add_kubernetes_metadata docs for details: https://www.elastic.co/docs/reference/beats/filebeat/add-kubernetes-metadata#_logs_path
      - type: filestream
      id: kubernetes-pod-logs
      gzip_experimental: true # BETA: enable gzip decompression. Refer to the docs for details: https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-filestream#reading-gzip-files
      parsers:
        - container: ~
      paths:
        - /var/log/pods/*/*/*.log*
      prospector:
        scanner:
          fingerprint.enabled: true
          symlinks: true
      file_identity.fingerprint: ~
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            default_indexers.enabled: false
            default_matchers.enabled: false
            indexers:
              - pod_uid:
            matchers:
              - logs_path:
                  logs_path: "/var/log/pods/"
                  resource_type: "pod"

      processors:
      - add_cloud_metadata:
      - add_host_metadata:

      cloud.id: ${ELASTIC_CLOUD_ID}
      cloud.auth: ${ELASTIC_CLOUD_AUTH}

      output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME}
      password: ${ELASTICSEARCH_PASSWORD}
  1. Verify Ingestion:
    • Use the "Get total log count" query. The total number of documents should be exactly 185000.
    • Use the "Get log count per file path" query. You should see documents ingested from different paths on /var/log/pods/ (including .gz files).
  2. Verify Metadata:
    • Query for logs They must contain kubernetes.pod.name but will not contain kubernetes.container.name (this is expected behaviour).

Updating an Existing Deployment

This test validates that a Filebeat instance will correctly ingest rotated logs after being updated.

  1. Deploy "Old" Filebeat: Deploy Filebeat using the below filebeat-kubernetes.yaml:
filebeat-kubernetes.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
   name: filebeat-config
   namespace: kube-system
   labels:
      k8s-app: filebeat
data:
   filebeat.yml: |-
      filebeat.inputs:
        - type: filestream
          id: kubernetes-container-logs
          paths:
            - /var/log/containers/*.log
          parsers:
            - container: ~
          prospector:
            scanner:
              fingerprint.enabled: true
              symlinks: true
          file_identity.fingerprint: ~
          processors:
            - add_kubernetes_metadata:
                host: ${NODE_NAME}
                matchers:
                  - logs_path:
                      logs_path: "/var/log/containers/"

      processors:
        - add_cloud_metadata:
        - add_host_metadata:

      cloud.id: ${ELASTIC_CLOUD_ID}
      cloud.auth: ${ELASTIC_CLOUD_AUTH}

      output.elasticsearch:
        hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
        username: ${ELASTICSEARCH_USERNAME}
        password: ${ELASTICSEARCH_PASSWORD}
  1. Generate Logs: Apply the flog.yaml and wait for it to complete.
  2. Verify Initial Ingestion: Check Elasticsearch. You will only see a fraction of the 185000 logs, sourced exclusively from /var/log/containers/.
  3. Update Filebeat: apply the following filebeat-kubernetes.yaml
filebeat-kubernetes.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
   name: filebeat-config
   namespace: kube-system
   labels:
      k8s-app: filebeat
data:
   filebeat.yml: |-
      filebeat.inputs:
            ## To enable collection of rotated pod logs, replace the above `filebeat.inputs` configuration with this.
            ## WARNING:
            ##    - enabling rotated pod logs ingestion might cause data re-ingestion, refer to the docs for details: https://www.elastic.co/docs/reference/beats/filebeat/running-on-kubernetes#_kubernetes_deploy_manifests
            ##    - container metadata isn't available when collecting logs from /var/log/pods/. Refer to add_kubernetes_metadata docs for details: https://www.elastic.co/docs/reference/beats/filebeat/add-kubernetes-metadata#_logs_path
        - type: filestream
          id: kubernetes-container-logs
          gzip_experimental: true # BETA: enable gzip decompression. Refer to the docs for details: https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-filestream#reading-gzip-files
          parsers:
            - container: ~
          paths:
            - /var/log/pods/*/*/*.log*
          prospector:
            scanner:
              fingerprint.enabled: true
              symlinks: true
          file_identity.fingerprint: ~
          processors:
            - add_kubernetes_metadata:
                host: ${NODE_NAME}
                default_indexers.enabled: false
                default_matchers.enabled: false
                indexers:
                  - pod_uid:
                matchers:
                  - logs_path:
                      logs_path: "/var/log/pods/"
                      resource_type: "pod"

      processors:
        - add_cloud_metadata:
        - add_host_metadata:

      cloud.id: ${ELASTIC_CLOUD_ID}
      cloud.auth: ${ELASTIC_CLOUD_AUTH}

      output.elasticsearch:
        hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
        username: ${ELASTICSEARCH_USERNAME}
        password: ${ELASTICSEARCH_PASSWORD}
  1. Delete the filebeat pod, so it’ll pickup the new config
  2. Verify Final Ingestion:
    • Wait for Filebeat to restart and harvest the new paths.
    • Use the "Get total log count" query. The total count should now increase from 185000.
    • Verify that new logs appear from /var/log/pods/, including .gz files.
    • Verify the metadata difference as described above.

Test Case 2: Autodiscover

This test validates log ingestion using hints-based autodiscover.

  1. Generate Logs (and wait):
    • Apply the flog.yaml (see below, note the container name flog-2 and different args).
    • Crucially, wait ~30 seconds after applying the job. This allows flog to generate logs and kubelet to perform at least two rotation before Filebeat is deployed.
flog.yaml:
apiVersion: batch/v1
kind: Job
metadata:
   name: flog-log-generator
spec:
   template:
      spec:
         containers:
            - name: flog-2
              image: mingrammer/flog
              args: ["-t", "stdout", "-f", "json", "-d", "100us", "-l"]
         restartPolicy: Never
   backoffLimit: 4
  1. Deploy Filebeat: Deploy Filebeat using the autodiscover-enabled ConfigMap.
filebeat-kubernetes-ConfigMap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
   name: filebeat-config
   namespace: kube-system
   labels:
      k8s-app: filebeat
data:
   filebeat.yml: |-
      filebeat.autodiscover:
       providers:
         - type: kubernetes
           node: ${NODE_NAME}
           cleanup_timeout: 1h
           hints.enabled: true
           hints.default_config:
             type: filestream
             id: kubernetes-container-logs-${data.kubernetes.pod.name}-${data.kubernetes.container.id}
             gzip_experimental: true # BETA: enable gzip decompression. Refer to the docs for details: https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-filestream#reading-gzip-files
             paths:
             - /var/log/containers/*-${data.kubernetes.container.id}.log
             - /var/log/pods/${data.kubernetes.namespace}_${data.kubernetes.pod.name}_${data.kubernetes.pod.uid}/${data.kubernetes.container.name}/*.log.* # Uncomment to ingest rotated pod logs. It might cause data re-ingestion, refer to the docs for details: https://www.elastic.co/docs/reference/beats/filebeat/running-on-kubernetes#_kubernetes_deploy_manifests. TODO(AndersonQ): update link to anchor to the new docs section.
             parsers:
             - container: ~
             prospector:
              scanner:
                fingerprint.enabled: true
                symlinks: true
             file_identity.fingerprint: ~

      processors:
        - add_cloud_metadata:
        - add_host_metadata:

      cloud.id: ${ELASTIC_CLOUD_ID}
      cloud.auth: ${ELASTIC_CLOUD_AUTH}

      output.elasticsearch:
        hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
        username: ${ELASTICSEARCH_USERNAME}
        password: ${ELASTICSEARCH_PASSWORD}
  1. Verify Ingestion:
    • Wait for Filebeat to ingest some data.
    • Use the "Get log count per file path" query. You should see documents ingested from both the /var/log/containers/ path and the /var/log/pods/ path.
  2. Verify Metadata:
    • Query for logs from both /var/log/containers/ and /var/log/pods/.
    • Unlike the static method, all logs (including rotated and compressed) must contain full Kubernetes metadata, including kubernetes.pod.name and kubernetes.container.name. This confirms autodiscover is enriching all paths correctly.

Related issues

Use cases

Ingest Kubernetes rotated logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 29, 2025
@AndersonQ AndersonQ self-assigned this Oct 29, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@AndersonQ AndersonQ added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 29, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 29, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 29, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @AndersonQ? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2025

Update the existing docs and configuration examples to explain how
to ingest Kubernetes rotated logs, including GZIP-compressed logs.
@AndersonQ AndersonQ force-pushed the 47383-k8s-rotated-gzip-docs branch from 839704f to 7042e97 Compare November 4, 2025 07:55
@AndersonQ AndersonQ changed the title [WIP][k8s docs] include how to ingest rotated container log files, including GZIP compressed [k8s docs] include how to ingest rotated container log files, including GZIP compressed Nov 4, 2025
@AndersonQ AndersonQ marked this pull request as ready for review November 4, 2025 07:56
@AndersonQ AndersonQ requested review from a team as code owners November 4, 2025 07:56
@AndersonQ AndersonQ requested a review from pchila November 4, 2025 07:56
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert added Team:Docs Label for the Observability docs team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Nov 4, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Docs Label for the Observability docs team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[k8s docs] Update k8s docs to include how to ingest rotated container log files, including GZIP compressed

3 participants