Skip to content

Add API response time monitoring per API client #7149

@vpaturet

Description

@vpaturet

Current situation: OTP exposes performance metrics for the TransModel API (latency breakdown per layer in the routing engine)
While this is useful for investigating performances issues, this cannot be used for monitoring service level objectives (SLO) per API consumers.

Feature request: a new prometheus-compatible metric that monitors percentile response time per API client.

  • The recording is performed at the HTTP request level and does not provide a breakdown per internal layers within OTP.
  • API clients are identified by a (configurable) HTTP header.
    example "x-client-name", "et-client-name".
  • To reduce the load on the metrics sub-system, metrics are exposed as (configurable) quantile buckets to be aggregated on the prometheus server side (Grafana, ...)
  • To prevent cardinality explosion, a (configurable) set of API clients are monitored individually. Unknown clients or requests that do not contain the client header are grouped under a common "other" category.

Metadata

Metadata

Assignees

No one assigned

    Labels

    !New FeatureA functional feature targeting the end user.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions