Skip to content

Prometheus metrics for libp2p protocols#1199

Open
lla-dane wants to merge 16 commits intolibp2p:mainfrom
lla-dane:metrics
Open

Prometheus metrics for libp2p protocols#1199
lla-dane wants to merge 16 commits intolibp2p:mainfrom
lla-dane:metrics

Conversation

@lla-dane
Copy link
Copy Markdown
Contributor

@lla-dane lla-dane commented Feb 9, 2026

Introduction

This pull request introduces Prometheus/Grafana metrics for core py-libp2p protocols, for real-time monitoring and analysis.

It enables developers to run a libp2p node and directly inspect internal protocol behavior—such as latency, message propagation, and DHT activity—through standard metrics pipelines.

A working demo (metrics-demo) is included in the examples directory, to showcase how multiple services operate together and how their metrics can be visualized using Prometheus and Grafana.

What's included

The following libp2p services are currently instrumented and exposed via Prometheus metrics:

Ping

  • ping: Round-trip time (RTT) measurements.
  • ping_failure: Failed ping attempts.

Provides visibility into peer-to-peer latency and connectivity reliability.

Gossipsub / Pubsub

  • gossipsub_received_total: Messages received
  • gossipsub_publish_total: Messages published
  • gossipsub_subopts_total: Subscription updates
  • gossipsub_control_total: Control messages
  • gossipsub_message_bytes: Message sizes

Enables monitoring of message propagation, throughput, and pubsub activity.

Kademlia (Kad-DHT)

  • kad_inbound_total: Total inbound requests
  • kad_inbound_find_node: FIND_NODE requests
  • kad_inbound_get_value: GET_VALUE requests
  • kad_inbound_put_value: PUT_VALUE requests
  • kad_inbound_get_providers: GET_PROVIDERS requests
  • kad_inbound_add_provider: ADD_PROVIDER requests

Swarm / Connection Lifecycle

  • swarm_incoming_conn: Incoming connections
  • swarm_incoming_conn_error: Incoming connection failures
  • swarm_dial_attempt: Outgoing dial attempts
  • swarm_dial_attempt_error: Dial failures

Tracks connection establishment behavior and network stability.

Demo & Observability Setup

A metrics-demo CLI is included to:

  • Run a libp2p node with Ping, Gossipsub, and Kad-DHT enabled
  • Connect multiple nodes and observe interactions
  • Expose metrics via an HTTP endpoint (localhost:8000)

A Docker-based setup is provided to launch:

  • Prometheus for metrics scraping
  • Grafana for visualization dashboards

This allows real-time inspection of protocol-level behavior across nodes.

Necessity

Currently, diagnosing issues in py-libp2p (e.g., latency spikes, dropped messages, or DHT inconsistencies) relies heavily on logs, which are:

  • difficult to aggregate
  • hard to analyze over time
  • unsuitable for production observability

This PR introduces structured, queryable metrics that:

  • enable real-time monitoring
  • integrate with standard observability tooling
  • make debugging and performance analysis significantly easier

Reference

Inspired by the metrics design in the Rust implementation:
https://github.com/libp2p/rust-libp2p/tree/master/misc/metrics

@lla-dane
Copy link
Copy Markdown
Contributor Author

metrics-2026-02-08_19.57.51.mp4

ping latency metrics(Histogram) on grafana

@lla-dane
Copy link
Copy Markdown
Contributor Author

gossipsub-metrics.mp4

Screencast of the gossipsub metrics. Following metrics are getting recorded:

  • gossipsub_receiived_total: Messages successfully received
  • gossipsub_publish_total: Messages to be published
  • gossipsub_subopts_total: Messages notifying peer subscriptions
  • gossipsub_control_total: Received control messages
  • gossipsub_message_bytes: Message size in bytes

@lla-dane lla-dane force-pushed the metrics branch 4 times, most recently from 3ab8490 to 1592d66 Compare March 22, 2026 15:39
@lla-dane lla-dane marked this pull request as ready for review March 23, 2026 05:04
@lla-dane lla-dane changed the title WIP: Prometheus metrics for libp2p protocols Prometheus metrics for libp2p protocols Mar 23, 2026
@seetadev
Copy link
Copy Markdown
Contributor

@lla-dane : Hi Abhinav, this is a really strong and impactful PR, great work 👏

Love how you’ve brought Prometheus/Grafana observability directly into py-libp2p, the coverage across Ping, Gossipsub, Kad-DHT, and Swarm gives a solid, end-to-end view of protocol behavior. The metrics feel well chosen and immediately useful for debugging and performance analysis.

The metrics-demo + Docker setup is a big win for DX as well, makes it super easy to spin things up and actually see what’s happening across nodes.

Overall, this is a big step toward production-grade observability for py-libp2p. Happy to help test or review further & excited to see this land. We will discuss this in detail tomorrow.

On the same note, wish if you could resolve the CI/CD issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants