-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Thanos, Prometheus and Golang version used:
Thanos: v0.39.2 (considering historic data, we've seen this behavior on previous releases)
Object Storage Provider:
Azure (should be not relevant)
What happened:
Exponential increase in active goroutines, probably linked to a mutex within the receiver router service. The increase of ingestor goroutines resets when router deployments are restarted and is likely caused by it.
Here's an example goroutine:
goroutine 1838951702 [sync.Cond.Wait, 24 minutes]:
sync.runtime_notifyListWait(0xc000d93d50, 0x0)
/go/pkg/mod/golang.org/[email protected]/src/runtime/sema.go:597 +0x159
sync.(*Cond).Wait(0xc005790f98?)
/go/pkg/mod/golang.org/[email protected]/src/sync/cond.go:71 +0x85
google.golang.org/grpc/internal/transport.(*http2Client).keepalive(0xc004116488)
/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1710 +0x225
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 1838951656
/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:399 +0x1dab
How to reproduce it (as minimally and precisely as possible):
No consistent way has been discovered yet though it happens somewhat frequently on our thanos installation.
Full logs to relevant components:
Logs are looking normal, there are no warnings/errors other than what's expected
Anything else we need to know:
goroutines-receive-ingestor.txt
goroutines-receive-router.txt
After rotating all receive routers (ingestors have not been restarted):
