Description
We have a use case where for a single incoming request, a microservice has to make many(in the order of 1000s) outgoing HTTP calls to other microservice to fetch GET some details. Our service is built using Scala, Http4s and Cats-Effect and is using http4s-blaze-client library for making outbound HTTP calls.
Currently in production we are seeing the failure org.http4s.client.WaitQueueFullFailure: Wait queue is full and org.http4s.client.PoolManager: Max wait queue limit of 1024 reached, not scheduling. Once the service goes into this state, its never recovering from it and we are completely blocked.
Below is the Blaze Client configuration we are using:
BlazeClientBuilder[F](global)
.withMaxWaitQueueLimit(1024)
.withRequestTimeout(20.seconds)
.resource
.map { client =>
ResponseLogger(logHeaders = false, logBody = true)(
RequestLogger(logHeaders = true, logBody = true, redactHeadersWhen = Middleware.SensitiveHeaders)(client)
)
}
Initially we were using the default setting of 256 for max wait queue limit but then decided to increase to 512 and then to 1024. Currently even the 1024 is not working.
I am not sure if this happens when the outbound HTTP request is slow or times out. There is a possibility that the API response is slow sometimes(but that will still return within 20seconds timeout that we set). But I do not have sufficient evidence to claim that it is the case here.
We are currently using the version http4s-blaze-client_2.13:0.21.0-M6.
I am not sure if increasing the wait queue size further would help. Is it possible to implement custom logic within the service to check the wait queue size and wait before submitting the request to the client? Please advise how to get around with this issue. Any help would be really appreciated.