-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Labels
!New FeatureA functional feature targeting the end user.A functional feature targeting the end user.
Milestone
Description
Current situation: OTP exposes performance metrics for the TransModel API (latency breakdown per layer in the routing engine)
While this is useful for investigating performances issues, this cannot be used for monitoring service level objectives (SLO) per API consumers.
Feature request: a new prometheus-compatible metric that monitors percentile response time per API client.
- The recording is performed at the HTTP request level and does not provide a breakdown per internal layers within OTP.
- API clients are identified by a (configurable) HTTP header.
example "x-client-name", "et-client-name". - To reduce the load on the metrics sub-system, metrics are exposed as (configurable) quantile buckets to be aggregated on the prometheus server side (Grafana, ...)
- To prevent cardinality explosion, a (configurable) set of API clients are monitored individually. Unknown clients or requests that do not contain the client header are grouped under a common "other" category.
Metadata
Metadata
Assignees
Labels
!New FeatureA functional feature targeting the end user.A functional feature targeting the end user.