Skip to content

feat: add Prometheus metrics foundation with /metrics admin endpoint#80

Open
nerdalert wants to merge 1 commit intopraxis-proxy:mainfrom
nerdalert:brent-prom-basics
Open

feat: add Prometheus metrics foundation with /metrics admin endpoint#80
nerdalert wants to merge 1 commit intopraxis-proxy:mainfrom
nerdalert:brent-prom-basics

Conversation

@nerdalert
Copy link
Copy Markdown
Member

Trying to keep this not obnoxiously large. Scoping to metrics foundation: /metrics, the Prom recorder, and low cardinality HTTP request metrics to make sure cardinality is a first class citizen here. With /metrics we can start thinking about token usage, quota, rate-limit, PoCs and how to scale them.

Summary

  • Add /metrics scrape endpoint to the admin listener alongside /healthy and /ready
  • Install metrics + metrics-exporter-prometheus as workspace dependencies
  • Emit praxis_http_requests_total counter and praxis_http_request_duration_seconds histogram from both HTTP handler logging hooks
  • Labels: method, status_class (2xx/3xx/4xx/5xx), route (placeholder unknown), cluster (from router or none)

Details

  • The metrics module (protocol/src/http/pingora/metrics.rs) installs a global Prometheus recorder via OnceLock when the admin listener is configured. Both with_body and no_body handlers call emit_request_metrics() in their logging() hook, reading the response status from session.response_written() and the cluster name from a metrics_cluster snapshot on PingoraRequestCtx (since cluster is consumed by filter context construction before logging runs).
  • Body-byte histograms (request_body_bytes, response_body_bytes) are intentionally excluded — the ctx byte counters are only populated when filter body hooks run, so the no-body handler path would emit misleading zeros. These will be reintroduced with independent transport-level byte counting.
  • Added method_label() in metrics.rs that passes through the 9 RFC 9110 standard methods and collapses everything else to "OTHER". The caller in emit_request_metrics now runs the raw method through this function before labeling. Custom methods like PURGE or garbage strings can no longer create unbounded cardinality.

Deferred to follow-up PRs to try and keep this already too large PR reviewable 😬: active request gauge, TCP metrics, upstream metrics, error counters, configurable label sets, path grouping, route name schema changes.

Cardinality

I intentionally kept the initial label set conservative to give some time to think about scaling.

  • status_class is used instead of raw status codes to avoid creating one series per status.
  • route is currently emitted as unknown instead of using raw request paths. Raw paths are intentionally avoided because IDs, tenant names, model names, or arbitrary user-controlled path segments can create unbounded series.
  • cluster comes from the configured router cluster name, or none when no cluster was selected. This assumes clusters remain operator-controlled config values, not per-request dynamic values.
  • method is the only request-derived label. It is expected to remain low-cardinality; a follow-up can normalize unknown/custom methods to OTHER if needed.

Follow-up work for #8 should keep this rule: metrics labels must come from bounded config or normalized enums, not raw request data. Route naming/path grouping should be added before route is populated with anything more specific than unknown.

Test plan

  • Unit tests for status_class() conversion (7 cases covering all classes + edge cases)
  • Unit test for prometheus_response() — verifies 200 status, correct content type, valid UTF-8 body
  • All 224 protocol crate tests pass
  • Clippy clean, cargo +nightly fmt clean
  • Manual: start proxy with admin config, curl localhost:9901/metrics returns Prometheus text format
  • Manual: send requests through proxy, verify praxis_http_requests_total and praxis_http_request_duration_seconds appear with correct labels

Manual validation/demo output: https://gist.github.com/nerdalert/e35609d4b38693fb27ae01e4b9644abd

Refs #8

Ty!

@praxis-bot praxis-bot marked this pull request as draft April 28, 2026 06:07
@praxis-bot
Copy link
Copy Markdown
Collaborator

Converted to draft: required checks failing.

- Add /metrics scrape endpoint to the admin listener alongside /healthy and /ready.
- Install metrics + metrics-exporter-prometheus as workspace dependencies.
- Emit praxis_http_requests_total counter and
  praxis_http_request_duration_seconds histogram from both HTTP handler logging hooks.
- Labels: method, status_class (2xx/3xx/4xx/5xx), route (placeholder unknown),
  cluster (from router or none).

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
@nerdalert nerdalert marked this pull request as ready for review April 29, 2026 23:48
@nerdalert nerdalert requested a review from a team April 29, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants