Skip to content

feat: Add OpenTelemetry support for observability #184

@sysit

Description

@sysit

Description

It would be great to have native OpenTelemetry (OTel) support in vllm-mlx for production observability.

Motivation

OpenTelemetry has become the industry standard for distributed tracing, metrics, and logs. Many organizations use OTel-compatible backends (Jaeger, Prometheus, Grafana, Datadog, etc.) for monitoring their ML inference services.

Proposed Features

1. Metrics

  • Request latency (P50, P95, P99)
  • Tokens per second (input/output)
  • Queue length / pending requests
  • GPU/memory utilization
  • Batch size distribution
  • Time-to-first-token (TTFT)

2. Traces

  • Request lifecycle (receive → queue → prefill → decode → response)
  • Model inference spans
  • Token generation steps
  • Tool call execution (for MCP)

3. Logs (optional)

  • Structured logging with trace correlation

Configuration

Environment variables following OTel conventions:

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=vllm-mlx
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

Use Cases

  • Production monitoring dashboards (Grafana)
  • Debug slow requests with distributed tracing
  • Performance optimization with detailed metrics
  • SLA/SLO monitoring
  • Cost attribution per model/request

References

Additional Context

This would complement existing monitoring approaches and enable seamless integration with modern observability stacks without vendor lock-in.

Happy to contribute or discuss implementation details!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions