@@ -5733,9 +5733,82 @@ groups:
57335733
57345734 - name : Jaeger
57355735 exporters :
5736- - name : Embedded exporter
5736+ - name : Embedded exporter (v2+)
57375737 slug : embedded-exporter
5738- doc_url : https://www.jaegertracing.io/docs/latest/monitoring/
5738+ doc_url : https://www.jaegertracing.io/docs/2.dev/operations/monitoring/
5739+ comments : |
5740+ Jaeger v2 is built on OpenTelemetry Collector and exposes metrics on port 8888 (/metrics).
5741+ It emits standard otelcol_* pipeline metrics alongside Jaeger-specific storage and query metrics.
5742+ For span ingestion pipeline alerts (refused spans, export failures, queue saturation),
5743+ use the OpenTelemetry Collector rules instead.
5744+ rules :
5745+ - name : Jaeger high storage error rate
5746+ description : " Jaeger on {{ $labels.instance }} is experiencing {{ $value | humanize }}% storage errors on {{ $labels.operation }}."
5747+ query : ' 100 * sum(rate(jaeger_storage_requests_total{result="err"}[1m])) by (instance, job, namespace, operation) / sum(rate(jaeger_storage_requests_total[1m])) by (instance, job, namespace, operation) > 1 and sum(rate(jaeger_storage_requests_total[1m])) by (instance, job, namespace, operation) > 0'
5748+ severity : warning
5749+ for : 5m
5750+ - name : Jaeger slow storage operations
5751+ description : " Jaeger on {{ $labels.instance }} storage p99 latency for {{ $labels.operation }} is {{ $value | humanizeDuration }}."
5752+ query : ' histogram_quantile(0.99, sum(rate(jaeger_storage_latency_seconds_bucket[5m])) by (le, instance, job, namespace, operation)) > 1'
5753+ severity : warning
5754+ for : 5m
5755+ comments : |
5756+ Threshold of 1s is a rough default. Adjust based on your storage backend and data volume.
5757+ - name : Jaeger query service high error rate
5758+ description : " Jaeger query service on {{ $labels.instance }} is returning {{ $value | humanize }}% HTTP 5xx errors."
5759+ query : ' 100 * sum(rate(http_server_request_duration_seconds_count{http_route="/api/traces",http_response_status_code=~"5.."}[1m])) by (instance, job, namespace) / sum(rate(http_server_request_duration_seconds_count{http_route="/api/traces"}[1m])) by (instance, job, namespace) > 1 and sum(rate(http_server_request_duration_seconds_count{http_route="/api/traces"}[1m])) by (instance, job, namespace) > 0'
5760+ severity : warning
5761+ for : 5m
5762+ comments : |
5763+ Filters on http_route="/api/traces" (the trace search endpoint). The http_server_request_duration_seconds
5764+ metric is emitted by the otelhttp middleware used by the Jaeger query service.
5765+ - name : Jaeger query service slow responses
5766+ description : " Jaeger query service on {{ $labels.instance }} p99 response latency is {{ $value | humanizeDuration }}."
5767+ query : ' histogram_quantile(0.99, sum(rate(http_server_request_duration_seconds_bucket{http_route="/api/traces"}[5m])) by (le, instance, job, namespace)) > 2'
5768+ severity : warning
5769+ for : 5m
5770+ comments : |
5771+ Threshold of 2s is a rough default. Adjust based on your storage backend and data volume.
5772+ - name : Jaeger storage completely unavailable
5773+ description : " Jaeger on {{ $labels.instance }} has 100% storage errors for {{ $labels.operation }} — storage backend may be down."
5774+ query : ' sum(rate(jaeger_storage_requests_total{result="err"}[1m])) by (instance, job, namespace, operation) > 0 and sum(rate(jaeger_storage_requests_total{result="ok"}[1m])) by (instance, job, namespace, operation) == 0'
5775+ severity : critical
5776+ for : 2m
5777+ comments : |
5778+ Fires when all storage operations for a given type are failing and none are succeeding.
5779+ Indicates the storage backend (Cassandra, Elasticsearch, etc.) is likely unreachable or misconfigured.
5780+ - name : Jaeger slow single trace retrieval
5781+ description : " Jaeger on {{ $labels.instance }} p99 latency for single trace retrieval is {{ $value | humanizeDuration }}."
5782+ query : ' histogram_quantile(0.99, sum(rate(http_server_request_duration_seconds_bucket{http_route="/api/traces/{traceID}"}[5m])) by (le, instance, job, namespace)) > 5'
5783+ severity : warning
5784+ for : 5m
5785+ comments : |
5786+ Single trace retrieval (/api/traces/{traceID}) can be slower than search, especially for large traces.
5787+ Threshold of 5s is a rough default.
5788+ - name : Jaeger service discovery errors
5789+ description : " Jaeger on {{ $labels.instance }} is returning {{ $value | humanize }}% HTTP 5xx errors on the services endpoint."
5790+ query : ' 100 * sum(rate(http_server_request_duration_seconds_count{http_route="/api/services",http_response_status_code=~"5.."}[1m])) by (instance, job, namespace) / sum(rate(http_server_request_duration_seconds_count{http_route="/api/services"}[1m])) by (instance, job, namespace) > 1 and sum(rate(http_server_request_duration_seconds_count{http_route="/api/services"}[1m])) by (instance, job, namespace) > 0'
5791+ severity : warning
5792+ for : 5m
5793+ comments : |
5794+ Errors on /api/services indicate the storage backend cannot return the list of instrumented services,
5795+ which breaks the Jaeger UI service selector.
5796+ - name : Jaeger no storage reads succeeding
5797+ description : " Jaeger on {{ $labels.instance }} has no successful storage reads for {{ $labels.operation }} in the past 15 minutes."
5798+ query : ' sum(increase(jaeger_storage_requests_total{result="ok"}[15m])) by (instance, job, namespace, operation) == 0 and sum(increase(jaeger_storage_requests_total[15m])) by (instance, job, namespace, operation) > 0'
5799+ severity : warning
5800+ for : 5m
5801+ comments : |
5802+ Fires when an operation (e.g. find_traces, get_services) has received requests but none succeeded.
5803+ May indicate a persistent storage error or a backend that is slow to recover.
5804+ - name : Embedded exporter (legacy, <v2)
5805+ slug : embedded-exporter-legacy
5806+ doc_url : https://www.jaegertracing.io/docs/1.x/monitoring/
5807+ comments : |
5808+ These rules target Jaeger v1.x metrics (jaeger_* prefix).
5809+ Jaeger v1 reached end-of-life on December 31, 2025.
5810+ For Jaeger v2+, use the "Embedded exporter (v2+)" rules instead.
5811+ Note: jaeger-agent was deprecated in v1.35 and removed in v2.0.
57395812 rules :
57405813 - name : Jaeger agent HTTP server errors
57415814 description : " Jaeger agent on {{ $labels.instance }} is experiencing {{ $value | humanize }}% HTTP server errors."
0 commit comments