Skip to content

Eventrouter - memory leak, fails silently after 3GB #1993

Open
@jeremych1000

Description

@jeremych1000

Bugs should be filed for issues encountered whilst operating logging-operator.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors. #logging-operator
Please provide as much detail as possible.

Describe the bug:
We use eventtailer with logging-operator to log Kubernetes events. The image used is 0.4.0 from https://github.com/kube-logging/eventrouter. I see it's a fork and that #1966 recognises that this isn't great.

We've seen eventrouter linearly consume more and more memory as time goes on, and it failed silently after 3GB of memory consumed. Restarting it fixed the problem.

Expected behaviour:
Memory usage of event tailer to be more or less constant.

Steps to reproduce the bug:
Monitor event tailer memory usage across a few days. Here's a screenshot.

The dropoff is caused by us merging a change which added requets/limits at 4pm on 19/3. You can still see the memory leak as the graph continues to go up before getting oomkilled.

Image

Additional context:
We are running it with vanilla config. Manifest posted below.

Environment details:

  • Kubernetes version (e.g. v1.15.2): v1.30.6
  • Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): on prem
  • logging-operator version (e.g. 2.1.1): 4.10.0, which is not the newest, but the eventrouter image is still 0.4.0
  • Install method (e.g. helm or static manifests): argocd
  • Logs from the misbehaving component (and any other relevant logs): no relevant logs
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:
apiVersion: logging-extensions.banzaicloud.io/v1alpha1
kind: EventTailer
metadata: <removed for brevity>
spec:
  containerOverrides:
    resources:
      limits:
        cpu: 50m
        memory: 250Mi
      requests:
        cpu: 10m
        memory: 100Mi
  controlNamespace: <redacted>
  image:
    imagePullSecrets: []
    pullPolicy: IfNotPresent
    repository:  <redacted - we use a proxy>
    tag: 0.4.0
  positionVolume:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        volumeMode: Filesystem
  workloadMetaOverrides:
    labels:
      logging-operator/component: eventTailer

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions