Description
Metrics are recorded by the ClientMetricInterceptor. The ClientMetricInterceptor records the duration of actions. It annotates them with an action name and the result.
Recently, I had an experience where I was looking to understand the impact of a downstream service's outage. I looked at the metrics. There was no record of the outage in the gRPC client metrics. After some further investigation, I found logs that showed the gRPC client was throwing exceptions. The reason these exceptions weren't recorded was because the only exception that misk makes available as a metric is an SocketTimeoutException
.
Why not record metrics for any exceptional circumstance? Given that there are already some kinds of errors, it feels like common sense to include these ones too. It would improve visibility and help us understand our systems.