Skip to content

Alloy consumes all available RAM when accidentally scraping a "large" sparse file #6014

@dylannorthrup

Description

@dylannorthrup

Component(s)

loki.source.file

What's wrong?

This is a repost of a 2+ year old unaddressed promtail bug. It is posted here in response to @JStickler's closing of that issue and saying promtail would receive no future support or updates

This report is an updated version of the original posted by @tiimwsuqld on 15 Nov 2023

Describe the bug Alloy consumes all RAM (doesn't start swapping) and causes the VM to freeze. OOM doesn't appear to kick in. This is caused when /var/log/lastlog ends up in the pattern match, which is a massively sparse file with almost no data in it. Alloy shouldn't consume all memory to read large files.

Expected behavior Alloy should limit used memory (even at startup) so it can't consume everything on the machine causing it to trigger an oom-kill. Yes, this can be avoided by excluding the lastlog file from being processed, but we should have limits on memory usage.

Environment:
* Infrastructure: Google Cloud VM (e2-standard-2, 8GB RAM)
* Deployment tool: docker-compose

Screenshots, Alloy config, or terminal output If applicable, add any output to help explain your problem.

This is a system exhibiting the "lastlog looks really big but really isn't" issue:

hostname> /bin/ls -sh /var/log/lastlog
76K /var/log/lastlog
hostname> /bin/ls -lh /var/log/lastlog
-rw-rw-r--. 1 root utmp 1.2T Apr  8 14:45 /var/log/lastlog
hostname>

Yes, this can be mitigated by not monitoring lastlog or using __path_exclude__ for globs that would include it, but alloy really shouldn't crash systems if it points to a sparse file.

Steps to reproduce

  1. Configure a user on a Linux system with a very large UID (1,000,000+ or so). Run ls -l /var/log/lastlog to verify the file size is reported to be at least as high as the amount of RAM on the system.
  2. Have a stanza in your config.alloy watching /var/log/lastlog
  3. Start alloy.
  4. Watch memory yo-yo between normal and 100% while alloy is oom-killed, then restarted by systemd.

System information

Linux 5.14.0-611.27.1 x86_64

Software version

Grafana Alloy 1.13.0

Configuration

loki.write "logs_base" {
    endpoint {
      url = "https://loki.example.com/loki/api/v1/push"

      basic_auth {
        username = "loki_writer"
        password = "loki_password"
      }

    }
    external_labels = {}
  }

  local.file_match "logs_base_lastlog" {
    path_targets = [{
      __address__       = "hostname.example.com",
      __path__          = "/var/log/lastlog",
    }]
  }

  loki.source.file "logs_base_lastlog" {
    targets    = local.file_match.logs_base_lastlog.targets
    forward_to = [loki.write.logs_base.receiver]
  }

Logs


Tip

React with 👍 if this issue is important to you.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions