Demonstrates using LangChain with OpenTelemetry to:
- Redact LLM outputs before forwarding traces to LangSmith
- Derive span metrics (latency, throughput, error rate) from traces
- Visualize those metrics in Grafana via Prometheus
┌─────────────────────┐
│ Python App │
│ (LangChain + OTel) │
└────────┬────────────┘
│ OTLP/HTTP :4318
▼
┌───────────────────────────────────────────────────────┐
│ OTel Collector │
│ │
│ Processors: │
│ batch → transform/mask_llm_output │
│ (replaces gen_ai.completion with REDACTED) │
│ │
│ Connectors: │
│ spanmetrics → derives metrics from trace spans │
│ (latency histograms, call counts, by model/op) │
│ │
│ Exporters: │
│ ┌───────────┬──────────────┬──────────┬───────────┐ │
│ │ debug │ file │ otlphttp/│prometheus │ │
│ │ │ (collector. │ langsmith│ :8889 │ │
│ │ │ log) │ │ │ │
│ └───────────┴──────────────┴──────────┴───────────┘ │
└──────────────────────┬─────────────────┬──────────────┘
│ OTLP/HTTP │ /metrics
▼ ▼
┌──────────────────┐ ┌──────────────┐
│ LangSmith │ │ Prometheus │
│ (redacted │ │ :9090 │
│ traces) │ └──────┬───────┘
└──────────────────┘ │
▼
┌──────────────┐
│ Grafana │
│ :3000 │
└──────────────┘
- Copy the
.envfile and fill in your API keys:
LANGSMITH_OTEL_ENABLED="true"
LANGSMITH_TRACING="true"
LANGSMITH_OTEL_ONLY="true"
LANGSMITH_API_KEY="<your-langsmith-api-key>"
LANGSMITH_PROJECT="<your-langsmith-project>"
OPENAI_API_KEY="<your-openai-api-key>"
The LANGSMITH_API_KEY variable is passed through to the OTel collector container via docker-compose, where it is used for the x-api-key header when exporting to LangSmith.
docker compose upThis starts:
- OTel Collector — receives traces on
:4318, exposes metrics on:8889 - Prometheus — scrapes collector metrics, available at localhost:9090
- Grafana — pre-configured dashboard, available at localhost:3000 (no login required)
uv syncuv run main.pyThis invokes a LangChain chain across multiple topics. Traces are sent via OTLP to the collector, which:
- Redacts the LLM output and forwards traces to LangSmith
- Derives span metrics (latency, call count) and exposes them to Prometheus
Open localhost:3000 and navigate to the LangChain Span Metrics dashboard.
You'll see:
- Request rate — calls per second over time
- Latency percentiles — p50/p95/p99 latency by operation
- Total calls & error rate — summary stats
- Average latency by operation — bar gauge
- Latency heatmap — distribution over time
- Breakdowns by model and operation — pie charts
Run uv run main.py multiple times to generate more data points.
The OTel collector's spanmetrics connector automatically derives the following metrics from trace spans:
| Metric | Description |
|---|---|
langchain_calls_total |
Total number of span calls, labeled by operation and model |
langchain_duration_milliseconds_* |
Latency histogram (bucket/sum/count) for each span |
These are broken down by dimensions: gen_ai.system, gen_ai.request.model, and gen_ai.operation.name.
