Skip to content

feat: add Prometheus metrics endpoint #595

@Anshumancanrock

Description

@Anshumancanrock

Description

Right now there's no way to know what's happening on a running nostream relay without connecting directly to the database. No connection counts, no event throughput, no latency data. You're flying blind.

Proposal

Two things:

A GET /metrics endpoint that returns standard Prometheus text format. Operators who already run Prometheus + Grafana can point a scrape config at it and get dashboards immediately with no extra work.

Example output:

# HELP nostream_connections_active Current number of active WebSocket connections
# TYPE nostream_connections_active gauge
nostream_connections_active 42

# HELP nostream_connections_total Total WebSocket connections since process start
# TYPE nostream_connections_total counter
nostream_connections_total 18340

# HELP nostream_events_received_total Events received from clients
# TYPE nostream_events_received_total counter
nostream_events_received_total{kind="1"} 52341
nostream_events_received_total{kind="7"} 12032

# HELP nostream_events_accepted_total Events written to the database
# TYPE nostream_events_accepted_total counter
nostream_events_accepted_total 48210

# HELP nostream_events_rejected_total Events rejected before storage
# TYPE nostream_events_rejected_total counter
nostream_events_rejected_total{reason="rate-limited"} 2891
nostream_events_rejected_total{reason="invalid"} 340
nostream_events_rejected_total{reason="blocked"} 900

# HELP nostream_subscriptions_active Current active REQ subscriptions
# TYPE nostream_subscriptions_active gauge
nostream_subscriptions_active 156

# HELP nostream_eose_duration_seconds Time from REQ received to EOSE sent
# TYPE nostream_eose_duration_seconds histogram
nostream_eose_duration_seconds_bucket{le="0.01"} 18200
nostream_eose_duration_seconds_bucket{le="0.05"} 31200
nostream_eose_duration_seconds_bucket{le="0.1"} 38900
nostream_eose_duration_seconds_bucket{le="0.5"} 42100
nostream_eose_duration_seconds_bucket{le="1"} 42800
nostream_eose_duration_seconds_bucket{le="5"} 42990
nostream_eose_duration_seconds_bucket{le="+Inf"} 43001
nostream_eose_duration_seconds_sum 4312.94
nostream_eose_duration_seconds_count 43001

# HELP nostream_db_query_duration_seconds Database query latency
# TYPE nostream_db_query_duration_seconds histogram
nostream_db_query_duration_seconds_bucket{le="0.005"} 39100
nostream_db_query_duration_seconds_bucket{le="0.01"} 45210
nostream_db_query_duration_seconds_bucket{le="0.025"} 47300
nostream_db_query_duration_seconds_bucket{le="0.05"} 48100
nostream_db_query_duration_seconds_bucket{le="0.1"} 48900
nostream_db_query_duration_seconds_bucket{le="0.5"} 49150
nostream_db_query_duration_seconds_bucket{le="+Inf"} 49200
nostream_db_query_duration_seconds_sum 621.33
nostream_db_query_duration_seconds_count 49200

Also a simple GET /stats page for operators who don't run Grafana , just a server-rendered HTML page using the same Bootstrap template pattern as the existing /, /invoices, and /terms pages.

Both endpoints are disabled by default and opt-in via settings.yaml

How it works

A MetricsStore singleton holds all counters and histograms in memory. Each component calls into it directly:

  • WebSocketServerAdapter increments connection counters on open/close
  • WebSocketAdapter tracks active subscription count
  • EventMessageHandler increments event counters per kind and per rejection reason
  • SubscribeMessageHandler records EOSE latency
  • EventRepository records query latency

Plan

I'll split this into smaller PRs:

PR 1 : MetricsStore + settings flag + tests. No routes yet, just the data structure.

PR 2 : GET /metrics route + connection and subscription counters wired up. At this point you can actually curl the endpoint.

PR 3 : Event counters: received/accepted/rejected by kind and reason, wired into EventMessageHandler.

PR 4 : Latency histograms: EOSE duration in the subscribe handler, query duration in the event repository.

PR 5 : GET /stats HTML page for operators who prefer a browser over Prometheus.

PR 6 : Docs: metric reference + a docker-compose example with a Prometheus + Grafana sidecar for operators who want to set up the full stack.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions