Skip to content

Telemetry: Prometheus metrics + health endpoints (4‑PR series) #4376

@Jim8y

Description

@Jim8y

Summary

This issue tracks the introduction of first‑class node telemetry for Neo N3 via a new Prometheus‑exporting plugin, health endpoints, and accompanying monitoring assets.

Motivation

Node operators currently rely on ad‑hoc log inspection and RPC polling for visibility. A built‑in telemetry surface enables:

  • consistent operational monitoring across deployments
  • alerting on sync/peer/mempool/resource health
  • easy integration with existing Prometheus/Grafana stacks

Scope / PR stack

The work is delivered as four small, stacked PRs targeting the telemetry staging branch:

  1. Core hooks[Telemetry 1/4] Add telemetry hooks to Neo core #4372 (telemetry-1)

    • Adds non‑intrusive P2P events in RemoteNode (MessageSent, ConnectionChanged) for plugins to observe network activity.
    • Exposes Plugin.IsStopped as protected internal so plugins can self‑disable safely.
  2. Telemetry plugin & metrics[Telemetry 2/4] Add Telemetry plugin with metrics collection #4373 (telemetry-2)

    • Introduces the Neo.Plugins.Telemetry plugin with collectors for blockchain, network, mempool, system, and plugin state.
    • Exposes a Prometheus endpoint (default http://localhost:9100/metrics).
    • Uses consistent labels: node_id, network.
  3. Health endpoints[Telemetry 3/4] Add health check endpoints for telemetry #4374 (telemetry-3)

    • Adds lightweight HTTP /health, /ready, and /live endpoints.
    • Readiness reflects both sync status and peer connectivity.
  4. Docs & monitoring assets[Telemetry 4/4] Add telemetry documentation and monitoring configs #4375 (telemetry-4)

    • Comprehensive README with install/config/metric reference.
    • Prometheus scrape snippet and alert rules.
    • Grafana dashboard JSON for node operators.

PRs should be merged in order 1 → 4. After that, the telemetry branch can be fast‑forwarded into the target release branch.

Usage

Enable the plugin via Plugins/Telemetry/config.json, then configure Prometheus to scrape:

scrape_configs:
  - job_name: neo-node
    static_configs:
      - targets: ['<host>:9100']

Health checks:

  • GET /health
  • GET /ready
  • GET /live

Validation

  • The plugin builds cleanly at each PR boundary.
  • Collectors are guarded so telemetry cannot destabilize the node unless explicitly configured with ExceptionPolicy = StopNode.

Follow‑ups

  • Add unit/integration tests for collectors once a Telemetry test harness is agreed.
  • Consider more granular per‑peer metrics and optional OpenTelemetry traces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions