Description
Feature description
Integrate Prometheus and Grafana with our OpenTelemetry audit to track key performance and usage metrics. This will allow us to collect, visualize, and analyze model behavior in real-time, ensuring better observability and debugging capabilities. The following metrics should be captured: prompt throughput, token throughput, time to first token, input tokens.
Motivation
By adding OpenTelemetry-based metrics collection and integrating it with Prometheus and Grafana, we can monitor request latency, token usage, and system throughput and set up alerts for anomalies to improve system reliability.
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
In review