Skip to content

prometheus: Clustering mode can leak series through LabelStore #5068

@kgeckhart

Description

@kgeckhart

Component(s)

Prometheus pipelines

What's wrong?

Every series appended is tracked through labelstore and is removed from labelstore when the series is determined to be stale. The originator of the series is responsible for sending staleness markers and when clustering is involved prometheus.scrape handles sending staleness markers.

The bug exists when a target is moved from one alloy instance to another in the cluster. When this happens we disable scraping from sending staleness markers because that alloy instance can no longer accurately say if the series is stale or not. Since no staleness markers are sent the labelstore series for that target will never be removed from memory unless the target eventually comes back.

Today there's no easy way to get from Target -> series (it's buried in prometheus internals) so we cannot easily remove the series from labelstore when moved. The work of, #5062, is intended to massively mitigate this issue but will not directly solve it. After the labelstore overhaul is completed we can assess impact + look for ideal solutions.

Steps to reproduce

Given the complex interactions it's hard to reproduce but we run clustered alloy in many places internally. We use remote write internally which tracks active series and because labelstore is required to hand out a GlobalRef for each active series the number of labelstore entries should be roughly equal to active series. At worst it would be roughly 2x if a relabel is involved (labelstore has a globalref for before the relabel and after).

Looking at one of our internal deployments where ~10% of pipelines include a relabel this factor is often above 2x and shows growth in to 3x where I would expect the factor to not exceed 2x.

Image

Tip

React with 👍 if this issue is important to you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions