[kube-prometheus-stack] Unexpected Spikes

### Describe the bug a clear and concise description of what the bug is.

We've observed an unexpected behavior in the rate() function when a counter resets due to a container restart. The rate() function is not handling the counter reset correctly, leading to misleading data in our Grafana dashboards.

### What's your helm version?

3.11.1

### What's your kubectl version?

1.31.0

### Which chart?

kube-prometheus-stack

### What's the chart version?

69.2.3

### What happened?

We noticed a sudden drop in the graph for our 'ara' service requests.
Upon investigation, we found that one pod had a container restart, causing its counter to reset.
The rate() function did not handle this reset correctly, resulting in a significant dip in the graph from about 80k requests/s to 12k requests/s.
This behavior persisted even when focusing on a single series, eliminating the possibility of it being caused by the sum aggregation.

![Image](https://github.com/user-attachments/assets/b41118e8-13ca-4a2f-aa1b-2754bda22e7f)
![Image](https://github.com/user-attachments/assets/0e84c71a-cb34-42f3-90c1-f288e43a753d)
![Image](https://github.com/user-attachments/assets/93406486-3162-4dfe-b8e8-3a82398ecdfe)
![Image](https://github.com/user-attachments/assets/fa03a90e-b381-4684-aaaa-5575b54c710f)
![Image](https://github.com/user-attachments/assets/279d8e36-0d68-4bf0-afa1-7e6d86afb093)
<img width="1512" alt="Image" src="https://github.com/user-attachments/assets/2ec7136f-9d36-4a23-bc31-7fa102d87f0a" />


### What you expected to happen?

We expected the rate() function to handle counter resets gracefully, as per the Prometheus documentation. The function should detect the reset and calculate the rate correctly, maintaining a consistent view of the request rate despite the container restart.

### How to reproduce it?

Use a Prometheus query with the rate() function on a counter metric, such as: rate(http_server_duration_milliseconds_count{service_name="ara"}[1m])
Trigger a container restart for one of the pods of the service being monitored.
Observe the resulting graph in Grafana over the period of the restart.

### Enter the changed values of values.yaml?

    scrapeInterval: 30s
    scrapeTimeout: 10s

### Enter the command that you execute and failing/misfunctioning.

N/A

### Anything else we need to know?

We've tested this with both Prometheus and Thanos data sources, yielding the same results.
The issue persists even when isolating a single series, ruling out problems with aggregation.
We've confirmed that the underlying counter did reset, as seen in the raw metric data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kube-prometheus-stack] Unexpected Spikes #5526

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[kube-prometheus-stack] Unexpected Spikes #5526

Description

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions