Skip to content

Phase 1b: Telemetry infrastructure — core library, build system, Docker stack#6437

Draft
pratikmankawde wants to merge 1 commit intopratik/otel-phase1a-plan-docsfrom
pratik/otel-phase1b-telemetry-infra
Draft

Phase 1b: Telemetry infrastructure — core library, build system, Docker stack#6437
pratikmankawde wants to merge 1 commit intopratik/otel-phase1a-plan-docsfrom
pratik/otel-phase1b-telemetry-infra

Conversation

@pratikmankawde
Copy link
Collaborator

@pratikmankawde pratikmankawde commented Feb 26, 2026

PR Chain: #6436#6437 (this PR)#6438#6424#6425#6426#6427#6433 / #6439
Base: pratik/otel-phase1a-plan-docs

High Level Overview of Change

Add the core OpenTelemetry telemetry library, build system integration, Docker observability stack, and Application-level lifecycle management. This provides the foundational infrastructure that all subsequent tracing phases build upon.

Stacked PR chain: PR #6436 (Phase 1a) → Phase 1b (this) → [Phase 1c: RPC integration] → PR #6424 → PR #6425 → PR #6426 → PR #6427

Context of Change

Implements the telemetry subsystem as a libxrpl library component with a clean separation between the tracing API and the OpenTelemetry SDK. The design uses:

  • Conditional compilation (XRPL_ENABLE_TELEMETRY) — zero overhead when disabled
  • NullTelemetry fallback — graceful no-op when OTel SDK initialization fails or is disabled
  • RAII SpanGuard — automatic span lifecycle management with attribute/exception helpers
  • Config-driven setup[telemetry] section in rippled config for endpoint, sampling, feature flags

The Docker stack (OTel Collector → Jaeger → Grafana) provides a local development environment for visualizing traces.

Design doc: OpenTelemetryPlan/ directory.

Type of Change

  • New feature (non-breaking change which adds functionality)

API Impact

  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)

Before / After

Before: No telemetry or tracing infrastructure exists in rippled.

After: rippled gains a Telemetry subsystem accessible via app.getTelemetry(). When telemetry=ON is set in the Conan build and [telemetry] is configured, spans are exported via OTLP/gRPC to an OpenTelemetry Collector. When disabled, all telemetry calls are no-ops.

Key files:

  • include/xrpl/telemetry/Telemetry.h — public API (startSpan, sampling predicates)
  • include/xrpl/telemetry/SpanGuard.h — RAII span wrapper
  • src/libxrpl/telemetry/Telemetry.cpp — OTel SDK setup, provider lifecycle
  • src/libxrpl/telemetry/NullTelemetry.cpp — no-op fallback
  • src/xrpld/app/main/Application.cpp — lifecycle integration (init, start, stop)

Next Tasks

  • Phase 1c: RPC layer tracing instrumentation using this infrastructure
  • Phase 2+: Transaction tracing, consensus tracing, dashboards

Add the OpenTelemetry telemetry library and supporting infrastructure:

Build system:
- Conan opentelemetry-cpp dependency with OTLP/gRPC exporter
- CMake integration for xrpl_telemetry library target
- Levelization ordering updates

Core library (libxrpl):
- Telemetry class: provider lifecycle, span creation, sampling config
- SpanGuard: RAII span management with attribute/exception helpers
- TelemetryConfig: parse [telemetry] config section
- NullTelemetry: no-op implementation when telemetry is disabled

Application integration:
- Telemetry member in ApplicationImp with start/stop lifecycle
- getTelemetry() interface on Application
- ServiceRegistry telemetry accessor

Docker observability stack:
- OTel Collector, Jaeger, Grafana docker-compose setup
- Collector config with OTLP gRPC receiver and Jaeger exporter

Config and docs:
- Example telemetry config section in xrpld-example.cfg
- Build documentation for telemetry setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pratikmankawde pratikmankawde force-pushed the pratik/otel-phase1b-telemetry-infra branch from 85325af to 252a4bb Compare February 27, 2026 17:53
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 30.17241% with 81 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (pratik/otel-phase1a-plan-docs@a6a6a7c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/libxrpl/telemetry/Telemetry.cpp 7.1% 79 Missing ⚠️
src/xrpld/app/main/Application.cpp 77.8% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                       Coverage Diff                       @@
##             pratik/otel-phase1a-plan-docs   #6437   +/-   ##
===============================================================
  Coverage                                 ?   79.7%           
===============================================================
  Files                                    ?     851           
  Lines                                    ?   67872           
  Branches                                 ?    7589           
===============================================================
  Hits                                     ?   54096           
  Misses                                   ?   13776           
  Partials                                 ?       0           
Files with missing lines Coverage Δ
include/xrpl/core/ServiceRegistry.h 100.0% <ø> (ø)
include/xrpl/telemetry/Telemetry.h 100.0% <100.0%> (ø)
src/libxrpl/telemetry/TelemetryConfig.cpp 100.0% <100.0%> (ø)
src/xrpld/app/main/Application.cpp 70.2% <77.8%> (ø)
src/libxrpl/telemetry/Telemetry.cpp 7.1% <7.1%> (ø)

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DistributedTracingAndObservability Distributed Tracing And Observability related changes DraftRunCI Normally CI does not run on draft PRs. This opts in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant