[Docs/Bug] 'ingester.ring.replication_factor' critically affects query availability even when ingest storage (Kafka) is enabled

### What is the bug?

I am running Grafana Mimir with **Ingest Storage (Kafka)** enabled. According to the configuration comments and documentation for `ingester.ring.replication_factor`:

> "This configuration is not used when ingest storage is enabled."

However, I have observed that this configuration **is critical** for query availability (Read Path) when using the Partition Ring (Kafka).

If this value is left at default (which seems to be 1 in some contexts or if not explicitly set to 3) while using Kafka, performing a rolling restart of the Ingester StatefulSet causes immediate query failures.

The error observed in the querier/ruler is:
`partition <ID>: too many unhealthy instances in the ring`

It appears that while the **Write Path** (Distributor -> Kafka) relies on Kafka's replication, the **Read Path** (Ingester consuming Kafka) relies on `ingester.ring.replication_factor` to determine how many Ingesters consume the same partition. If this is 1, a single Ingester restart results in 100% downtime for that partition's data availability.

### How to reproduce it?

1. Deploy Mimir on Kubernetes (StatefulSet) with `ingest_storage` enabled (using Kafka).
2. Do **not** explicitly set `ingester.ring.replication_factor` to 3 (leave it at default, or set it to 1).
3. Start a rolling update of the Ingester StatefulSet (e.g., `kubectl rollout restart statefulset/mimir-ingester`).
4. Execute PromQL queries continuously during the rollout.
5. Observe that when an Ingester pod restarts (even if it's a graceful shutdown), queries fail immediately with 500 errors regarding "unhealthy instances".

### What did you think would happen?

Based on the documentation stating "This configuration is not used when ingest storage is enabled", I expected that:

1. I did not need to configure `ingester.ring.replication_factor` when using Kafka.
2. Mimir would handle Ingester rolling updates gracefully without query interruptions, assuming Kafka ensures data durability.

**Suggestion:**
The documentation should be updated to clarify that `ingester.ring.replication_factor` **IS** used for the Read Path / Partition Ring to ensure High Availability during consumption. It should recommend setting this to 3 (or >1) even when Kafka is enabled to tolerate Ingester restarts.

### What was your environment?

Environment:



Mimir version: 3.0.0



Deployment: Kubernetes (StatefulSet)



Kafka enabled: Yes







Additional Context: 

runtimeConfig:

  overrides:

    anonymous:

      ingestion_rate: 8000000

      ingestion_burst_size: 40000000

      max_global_series_per_user: 250000000

      max_label_names_per_series: 100

      ruler_max_rules_per_rule_group: 40



    ingest_storage:

      enabled: true



      kafka:

        address: "xxxx:9092"

        topic: "mimir-ingestion"

        client_id: "mimir-ingester"

        consumer_group: ""

        producer_max_record_size_bytes: 10485760

        producer_max_buffered_bytes: 1073741824

        consume_from_position_at_startup: "last-offset"

### Any additional context to share?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs/Bug] 'ingester.ring.replication_factor' critically affects query availability even when ingest storage (Kafka) is enabled #13801

What is the bug?

How to reproduce it?

What did you think would happen?

What was your environment?

Any additional context to share?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Docs/Bug] 'ingester.ring.replication_factor' critically affects query availability even when ingest storage (Kafka) is enabled #13801

Description

What is the bug?

How to reproduce it?

What did you think would happen?

What was your environment?

Any additional context to share?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions