Skip to content

Conversation

@IlumCI
Copy link
Contributor

@IlumCI IlumCI commented Nov 12, 2025

Description

This PR adds comprehensive OpenTelemetry telemetry integration to the Swarms framework, enabling distributed tracing, metrics, and logging capabilities across agents and multi-agent structures. The implementation follows OpenTelemetry standards and provides observability for agent executions, swarm router operations, and LLM calls.

Key Features

  • Distributed Tracing: Automatic span creation for agent runs, LLM calls, and swarm router executions
  • Metrics Collection: Counter and histogram metrics for agent executions, loops, LLM calls, and errors
  • Structured Logging: OpenTelemetry-compatible logging with severity levels and attributes
  • Context Propagation: Trace context propagation for distributed tracing across agents
  • Configurable via Environment Variables: Full configuration through standard OTEL environment variables
  • Graceful Degradation: Telemetry is optional and gracefully handles missing dependencies

Implementation Details

The telemetry system is integrated at multiple levels:

  • Agent Level: Traces agent runs, LLM calls, tool executions, and records execution metrics
  • Swarm Router Level: Traces swarm router operations and propagates trace context to child swarms
  • Error Tracking: Automatic error recording in spans and metrics for debugging

File Changes

swarms/telemetry/opentelemetry_integration.py (NEW FILE)

  • Core OpenTelemetry integration module
  • Provides trace_span() context manager for creating spans
  • Implements record_metric() for metrics collection
  • Includes log_event() for structured logging
  • Supports trace context propagation via get_current_trace_context() and set_trace_context()
  • Configurable via environment variables (OTEL_ENABLED, OTEL_EXPORTER_OTLP_ENDPOINT, etc.)
  • Gracefully handles missing OpenTelemetry dependencies

swarms/telemetry/__init__.py

  • Updated to export OpenTelemetry integration functions
  • Conditionally exports telemetry functions when OpenTelemetry packages are available
  • Maintains backward compatibility when OpenTelemetry is not installed

swarms/structs/agent.py

  • Added enable_telemetry parameter to Agent __init__() method
  • Integrated telemetry in _run() method:
    • Creates trace span for agent execution with attributes (agent.id, agent.name, agent.model, etc.)
    • Records metrics for agent executions (total, success, errors)
    • Records loop count metrics
  • Integrated telemetry in call_llm() method:
    • Creates trace span for LLM calls with attributes (model, loop number, task length, etc.)
    • Records metrics for LLM call duration and total calls
    • Records error metrics for failed LLM calls
  • Error handling with telemetry:
    • Records error metrics with error type
    • Logs error events with OpenTelemetry logging
    • Sets span status to ERROR on exceptions

swarms/structs/swarm_router.py

  • Added telemetry_enabled parameter to SwarmRouter __init__() method
  • Integrated telemetry in _run() method:
    • Creates trace span for swarm router execution with attributes (router.id, router.name, swarm_type, etc.)
    • Records metrics for swarm router executions (total, errors)
    • Propagates trace context to child swarms for distributed tracing
  • Error handling with telemetry:
    • Records error metrics with error type and swarm type
    • Sets span status to ERROR on exceptions

Dependencies

  • opentelemetry-api>=1.20.0
  • opentelemetry-sdk>=1.20.0
  • opentelemetry-exporter-otlp>=1.20.0

Note: These dependencies are optional. The framework works without them, but telemetry features will be disabled.

Configuration

Telemetry is configured via environment variables:

  • OTEL_ENABLED: Enable/disable OpenTelemetry (default: "true")
  • OTEL_SERVICE_NAME: Service name for traces (default: "swarms")
  • OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint URL (e.g., "http://localhost:4317")
  • OTEL_EXPORTER_OTLP_HEADERS: Headers for OTLP exporter (JSON format)
  • OTEL_TRACES_EXPORTER: Traces exporter (default: "otlp")
  • OTEL_METRICS_EXPORTER: Metrics exporter (default: "otlp")
  • OTEL_LOGS_EXPORTER: Logs exporter (default: "otlp")
  • OTEL_SDK_DISABLED: Disable OpenTelemetry SDK (default: "false")

Usage Example

import os
from swarms import Agent

# Configure OpenTelemetry
os.environ["OTEL_ENABLED"] = "true"
os.environ["OTEL_SERVICE_NAME"] = "my-swarm-service"
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:4317"

# Create agent with telemetry enabled
agent = Agent(
    agent_name="MyAgent",
    model_name="gpt-4o-mini",
    enable_telemetry=True,  # Enable telemetry
)

# Run agent - traces and metrics will be automatically collected
result = agent.run("Your task here")

Testing

  • Telemetry integration is tested with Jaeger, Tempo, and OpenTelemetry Collector

  • Verified trace propagation across agent hierarchies

  • Confirmed metrics collection and export

  • Tested graceful degradation when OpenTelemetry packages are not installed

  • Video of testing with Jaeger:

bounty2.mp4

Issue: #1199

Tag Maintainer

@kyegomez

Twitter Handle

https://x.com/IlumTheProtogen


📚 Documentation preview 📚: https://swarms--1200.org.readthedocs.build/en/1200/

Added OpenTelemetry integration for tracing and metrics.
Added OpenTelemetry integration functions to __all__ export.
This file implements OpenTelemetry integration for the Swarms framework, enabling distributed tracing, metrics, and logging capabilities. It includes configuration options via environment variables and provides functions for tracing, logging, and metrics recording.
Add telemetry support for agent execution and LLM calls.
Comment on lines +80 to +84
from swarms.telemetry.opentelemetry_integration import (
trace_span,
record_metric,
log_event,
)

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import swarms.telemetry.opentelemetry_integration.
Comment on lines +30 to +34
from swarms.telemetry.opentelemetry_integration import (
trace_span,
record_metric,
get_current_trace_context,
)

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import swarms.telemetry.opentelemetry_integration.
span_manager = nullcontext()

try:
self.swarm = self._create_swarm(task, *args, **kwargs)

Check failure

Code scanning / Pyre

Undefined attribute Error

Undefined attribute [16]: SwarmRouter has no attribute swarm.
span_manager = nullcontext()

try:
self.swarm = self._create_swarm(task, *args, **kwargs)

Check failure

Code scanning / Pyre

Incompatible parameter type Error

Incompatible parameter type [6]: In call SwarmRouter._create_swarm, for 1st positional argument, expected str but got Optional[str].
Comment on lines +9 to +18
from swarms.telemetry.opentelemetry_integration import (
get_tracer,
get_meter,
trace_span,
trace_function,
record_metric,
get_current_trace_context,
set_trace_context,
log_event,
)

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import swarms.telemetry.opentelemetry_integration.
Comment on lines +42 to +44
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import (
OTLPMetricExporter,
)

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import opentelemetry.exporter.otlp.proto.grpc.metric_exporter.
Comment on lines +45 to +47
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
OTLPLogExporter,
)

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import opentelemetry.exporter.otlp.proto.grpc._log_exporter.
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
OTLPLogExporter,
)
from opentelemetry.sdk.resources import Resource

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import opentelemetry.sdk.resources.
Comment on lines +49 to +51
from opentelemetry.trace.propagation.tracecontext import (
TraceContextTextMapPropagator,
)

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import opentelemetry.trace.propagation.tracecontext.
return

try:
from opentelemetry._logs import SeverityNumber

Check failure

Code scanning / Pyre

Undefined import Error

Undefined import [21]: Could not find a module corresponding to import opentelemetry._logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant