Skip to content

Phase 5: Observability stack — spanmetrics, dashboards, runbook#6427

Draft
pratikmankawde wants to merge 1 commit intopratik/otel-phase4-consensus-tracingfrom
pratik/otel-phase5-docs-deployment
Draft

Phase 5: Observability stack — spanmetrics, dashboards, runbook#6427
pratikmankawde wants to merge 1 commit intopratik/otel-phase4-consensus-tracingfrom
pratik/otel-phase5-docs-deployment

Conversation

@pratikmankawde
Copy link
Collaborator

@pratikmankawde pratikmankawde commented Feb 25, 2026

PR Chain: #6436#6437#6438#6424#6425#6426#6427 (this PR)#6433 / #6439
Base: pratik/otel-phase4-consensus-tracing

High Level Overview of Change

Add the observability stack for consuming trace data: OTel Collector spanmetrics connector for deriving RED metrics, Prometheus for metrics storage, and three pre-built Grafana dashboards (RPC Performance, Consensus Health, Transaction Overview). Includes an operator runbook for setup, configuration, and troubleshooting.

Context of Change

Phase 5 of the OpenTelemetry distributed tracing project. Phases 2–4 produce trace spans but operators need dashboards and metrics to get value from the data. The spanmetrics connector in the OTel Collector derives rate/error/duration (RED) metrics from trace spans without additional instrumentation. Prometheus scrapes these metrics, and Grafana provides visualization.

The three dashboards cover the primary tracing domains:

  • RPC Performance: request rate, p95 latency, error rate, latency heatmap
  • Consensus Health: round duration, proposal rate, validation rate
  • Transaction Overview: processing rate, latency, sync vs async path distribution

Design doc: OpenTelemetryPlan/ directory in this repo.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

API Impact

No API impact. This change only affects the Docker-based observability stack and documentation.

Test Plan

  • docker compose up and verify Prometheus scrapes spanmetrics at :8889.
  • Verify Grafana dashboards load with the auto-provisioned Prometheus datasource.
  • Submit transactions and verify metrics flow: rippled → Collector → spanmetrics → Prometheus → Grafana.

@pratikmankawde pratikmankawde added the DraftRunCI Normally CI does not run on draft PRs. This opts in. label Feb 25, 2026
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 4a96959 to 0ad5ca4 Compare February 25, 2026 22:13
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from 6cff91c to 8440a32 Compare February 25, 2026 22:13
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 0ad5ca4 to 1f7808e Compare February 25, 2026 22:22
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from 8440a32 to d8c284b Compare February 25, 2026 22:23
@pratikmankawde pratikmankawde added the DistributedTracingAndObservability Distributed Tracing And Observability related changes label Feb 25, 2026
@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.7%. Comparing base (735ecd1) to head (85f583f).

Additional details and impacted files

Impacted file tree graph

@@                         Coverage Diff                          @@
##           pratik/otel-phase4-consensus-tracing   #6427   +/-   ##
====================================================================
  Coverage                                  79.7%   79.7%           
====================================================================
  Files                                       851     851           
  Lines                                     67952   67952           
  Branches                                   7594    7589    -5     
====================================================================
+ Hits                                      54184   54189    +5     
+ Misses                                    13768   13763    -5     
Files with missing lines Coverage Δ
include/xrpl/telemetry/TraceContextPropagator.h 96.4% <ø> (ø)
src/libxrpl/telemetry/Telemetry.cpp 22.5% <ø> (ø)
src/xrpld/app/main/Application.cpp 70.4% <ø> (ø)

... and 3 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 1f7808e to 3f240a3 Compare February 25, 2026 23:49
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from 1d093fc to ec73991 Compare February 25, 2026 23:49
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 3f240a3 to 3402876 Compare February 26, 2026 00:18
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from ec73991 to 69b4829 Compare February 26, 2026 00:18
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 3402876 to 55bb221 Compare February 26, 2026 12:07
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from 46af63d to 2391149 Compare February 26, 2026 12:09
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 55bb221 to 4c67a5f Compare February 26, 2026 12:14
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 4c67a5f to 6e75849 Compare February 27, 2026 18:00
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from fc1ed3c to 3581839 Compare February 27, 2026 18:01
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 6e75849 to 04e94ce Compare February 27, 2026 18:06
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from 3581839 to 56cc5e6 Compare February 27, 2026 18:06
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase5-docs-deployment branch from 56cc5e6 to 85f583f Compare February 27, 2026 18:16
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase4-consensus-tracing branch from 04e94ce to 735ecd1 Compare February 27, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DistributedTracingAndObservability Distributed Tracing And Observability related changes DraftRunCI Normally CI does not run on draft PRs. This opts in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant