Skip to content

Bug: Increased latency when loosing an ingester since 2.15.0 #10764

Open
@raspbeguy

Description

@raspbeguy

What is the bug?

Since we upgraded to 2.15.0, loosing an ingester causes a lot of latency from the distributors. Removing the ingester from the ring brings latency back to normal.

How to reproduce it?

On a distributed Mimir cluster with:
- multiple instances of ingesters
- distributors with a consequent traffic
- ingester.ring.zone_awareness_enable=true
- ingester.ring.replication_factor=3
- ingester.ring.unregister_on_shudown=false

shut down an ingester

What did you think would happen?

Shutting down an ingester shouldn't make latency go up (like that was the case on previous releases)

What was your environment?

Debian VMs with Mimir 2.15.0 deployed as APT package. 6 distributors, 12 ingesters, spread over 3 geographical zones.

Any additional context to share?

We noticed no config difference between a cluster in 2.15 version and a cluster in 2.14.3 version on ingester, distributor or ingester_client section.
When the problem is occuring, trace shows that a distributor requests takes a lot more time than usual.

Some metrics when shutting down an ingester
Image
Image

Trace of a distributor request to ingesters when latency is healthy
Image

Trace of a distributor request to ingesters when latency is degraded
Image

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions