[kube-prometheus-stack] Apparently incorrect shard configuration

### Introduction

Assuming that
```bash
for suffix in 0 shard-{1,2,3}-0 ; do
echo $suffix
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-$suffix -- \
  wget -qO- "http://localhost:9090/api/v1/query?query=prometheus_tsdb_head_series" | \
  (grep -o 'prometheus-kube-prometheus-stack-prometheus-[^"]*' || echo 'a(nothing)') | \
  sed 's/[a-z-]*[a-z]/  /' ; 
done
```
gives the number of time series that a shard is reading from another shard.

And that 
```bash
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-$suffix -- \           
  wget -qO- "http://localhost:9090/api/v1/query?query=scrape_series_added"
```
gives the number of time series that a shard is scraping directly from the exporters in the nodes.

I get this data flow configuration.


### Direct scraping statistics

```
Pod                                           Time Series     Scrape Targets 
--------------------------------------------------------------------------------
prometheus-0                                  446,853         85             
shard-1-0                                     170,059         62             
shard-2-0                                     316,709         69             
shard-3-0                                     508,354         62             
--------------------------------------------------------------------------------
TOTAL                                         1,441,975  278
```


### Sharding configuration

Attempt 1
```
-0
  -2-0
-shard-1-0
  -1-0
-shard-2-0
  -0
  -3-0
-shard-3-0
  (nothing)
```

When writing down my expectations I realised that maybe the configuration is not deterministic. Then I re deployed and got

```
0
  -2-0
  -2-0
  -0
shard-1-0
  -1-0
  -1-0
shard-2-0
  -0
  -3-0
shard-3-0
  -3-0
```
In this configuration `0` isn't connected to `shard-1-0`

And a few minutes later tried again and got
```
0
  -2-0
  -0
shard-1-0
  -1-0
shard-2-0
  (nothing)
shard-3-0
  -3-0
  -0
```
Here `0` isn't connected to `shard-1-0` and `shard-2-0`.



### What's your helm version?

 version.BuildInfo{Version:"v3.16.2", GitCommit:"13654a52f7c70a143b1dd51416d633e1071faffb", GitTreeState:"clean", GoVersion:"go1.22.7"}

### What's your kubectl version?

Client Version: v1.33.5 Kustomize Version: v5.6.0 Server Version: v1.33.5-eks-113cf36

### Which chart?

kube-prometheus-stack

### What's the chart version?

79.5.0

### What happened?

I tried to enable sharding with 9 shards and what I have god is that the `prometheus-0` uses way more resources than `prometheus-shard-*` I tried tuning a few parameters. When I reduced to 4 shards I could get something that was at least not OOMKilled. But it still has some spike at the start that I assumed to be buffering (any advice on that is wellcome).

Since it wasn't behaving as expected I tried to check which targets were scraped by each shard and how the data was being consolidated. And I found that apparently the shards scraping is not fully connected.



### What you expected to happen?

I expected all the shards to be using a similar amount of resources, and that prometheus-0 was gathering the data from all other shards, on a deterministic network.

There was a memory spike, especially on prometheus-0, e.g. prometheus-5 going to 15 GB, while other shards were using about 4 GB. The shard data transfer is not configured as fully connected directed acyclic graph.


### How to reproduce it?

_No response_

### Enter the changed values of values.yaml?

_No response_

### Enter the command that you execute and failing/misfunctioning.

```bash
helm upgrade kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --version 79.5.0 \
  -f values.yml
```

### Anything else we need to know?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kube-prometheus-stack] Apparently incorrect shard configuration #6325

Introduction

Direct scraping statistics

Sharding configuration

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[kube-prometheus-stack] Apparently incorrect shard configuration #6325

Description

Introduction

Direct scraping statistics

Sharding configuration

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions