Metrics

The gateway exposes OpenTelemetry metrics via a Prometheus exporter. When enabled, metrics are available at GET /metrics in the standard Prometheus text format.

Enabling Metrics

metrics:
  enabled: true

Instruments

All metric names are prefixed with llm_gateway..

Request Metrics

llm_gateway.requests (counter) -- total chat completion requests.

Attribute	Values	Description
`provider`	`openai`, `anthropic`, `ollama`	which provider handled the request
`model`	model name	the model used
`streaming`	`true`, `false`	whether the request was streaming
`key`	key name or empty	the API key name (when virtual API keys are enabled)

llm_gateway.request.duration (histogram, seconds) -- end-to-end request duration including upstream provider latency.

Attribute	Values	Description
`provider`	`openai`, `anthropic`, `ollama`	which provider handled the request
`model`	model name	the model used
`key`	key name or empty	the API key name (when virtual API keys are enabled)

llm_gateway.requests.inflight (up-down counter) -- number of requests currently being processed. Incremented when a request enters the handler, decremented when it completes. Useful for understanding concurrency and detecting request pileups.

Token Metrics

llm_gateway.tokens.prompt (counter) -- total prompt (input) tokens across all requests.

Attribute	Values	Description
`provider`	provider name	which provider reported the usage
`model`	model name	the model used

llm_gateway.tokens.completion (counter) -- total completion (output) tokens across all requests.

Attribute	Values	Description
`provider`	provider name	which provider reported the usage
`model`	model name	the model used

Token metrics are recorded from the usage field in non-streaming chat completion responses. Streaming responses typically do not include token counts.

Routing Metrics

llm_gateway.routing.decisions (counter) -- semantic routing decisions, counted each time the router selects a model.

Attribute	Values	Description
`method`	`explicit`, `heuristic`, `semantic`, `classifier`, `default`	which routing layer made the decision

A high proportion of default decisions may indicate that thresholds are too strict or that route examples don't cover your traffic well.

Error Metrics

llm_gateway.provider.errors (counter) -- errors returned by upstream providers.

Attribute	Values	Description
`error_type`	`invalid_request_error`, `authentication_error`, `rate_limit_error`, `server_error`, `not_found_error`, `service_unavailable`, `unknown`	the error category

Health Metrics

llm_gateway.endpoint.healthy (up-down counter) -- per-endpoint health status for multi-endpoint mode. Value is 1 for healthy endpoints and 0 for unhealthy endpoints. The endpoint attribute identifies the endpoint by name.

Prometheus Scraping

Point your Prometheus instance at the gateway's /metrics endpoint:

# prometheus.yml
scrape_configs:
  - job_name: llm-gateway
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:8080"]

Useful Queries

Requests per minute by provider:

rate(llm_gateway_requests_total[5m]) * 60

Average request duration by model:

rate(llm_gateway_request_duration_seconds_sum[5m]) / rate(llm_gateway_request_duration_seconds_count[5m])

Token throughput (tokens per second):

rate(llm_gateway_tokens_prompt_total[5m]) + rate(llm_gateway_tokens_completion_total[5m])

Error rate as a percentage of total requests:

rate(llm_gateway_provider_errors_total[5m]) / rate(llm_gateway_requests_total[5m]) * 100

Routing method distribution:

rate(llm_gateway_routing_decisions_total[5m])

Current in-flight requests:

llm_gateway_requests_inflight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics

Enabling Metrics

Instruments

Request Metrics

Token Metrics

Routing Metrics

Error Metrics

Health Metrics

Prometheus Scraping

Useful Queries

FilesExpand file tree

metrics.md

Latest commit

History

metrics.md

File metadata and controls

Metrics

Enabling Metrics

Instruments

Request Metrics

Token Metrics

Routing Metrics

Error Metrics

Health Metrics

Prometheus Scraping

Useful Queries