Skip to content

Add a metric to track delay in ingestion #6748

Open
@rajagopalanand

Description

@rajagopalanand

Description

Ingestion delay is defined as delay between time of the sample accepted in Cortex and time the sample gets scraped

When there are ingestion delays, there can be a difference between recording rule and the underlying metric. This is because when looking back, it would appear as though the samples were there at the time of rule evaluation when in fact the sample might not have been ingested. Even though the samples arrive late, since the timestamp of samples are scrape timestamps, currently there is no way to know if there was a delay in ingestion. This can be confusing to troubleshoot and reason about why the recording rules differ from underlying metric.

Proposed Solution

I'm proposing to create a metric to track delays at the user/tenant level. For example, a histogram metric with buckets that can track at 5s, 30s, and > 30s can be helpful to know how much delay a tenant/user is experiencing. This metric can be used to set rule_query_offset. This new metric can be emitted from Distributor. To reduce # of time series, we might be able to use a native histogram

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions