Skip to content

feat: Add OpenTelemetry tracing support to vllm_performance actuator #816

@mgazz

Description

@mgazz

Is your feature request related to a problem? Please describe.

When running vLLM performance experiments with the vllm_performance actuator, we currently lack visibility into the internal request processing pipeline. Specifically, we cannot observe:

  • Pre-processing latency - Time spent preparing requests before inference
  • Post-processing latency - Time spent formatting and returning results after inference
  • Inter-processing times - Gaps and queuing delays between processing stages
  • Request arrival patterns - Temporal distribution and rate of incoming requests

vLLM already supports OpenTelemetry (OTEL) tracing via the --otlp-traces-endpoint flag and OTEL environment variables, but the vllm_performance actuator does not expose this functionality to users.

Describe the solution you'd like

Add OpenTelemetry tracing configuration support to the vllm_performance actuator, allowing users to collect distributed traces from vLLM deployments and send them to observability backends like Jaeger, Tempo, or any OTLP-compatible collector.

Implementation Plan

1. Add OTEL Configuration Parameters

File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/actuator_parameters.py

Add three new optional parameters to VLLMPerformanceTestParameters:

otel_traces_endpoint: Annotated[
    str | None,
    pydantic.Field(
        description="OpenTelemetry traces endpoint URL (e.g., http://jaeger-collector:4318/v1/traces). If set, enables OTLP tracing in vLLM."
    ),
] = None

otel_traces_protocol: Annotated[
    str,
    pydantic.Field(
        description="OpenTelemetry traces protocol (http/protobuf or grpc)"
    ),
] = "http/protobuf"

otel_service_name: Annotated[
    str,
    pydantic.Field(
        description="Service name for OpenTelemetry traces"
    ),
] = "vllm-server"

2. Update Deployment YAML Generation

File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/yaml_support/build_components.py

Change A: Add OTEL parameters to deployment_yaml() function signature:

def deployment_yaml(
    k8s_name: str,
    model: str,
    # ... existing parameters ...
    io_processor_plugin: str | None = None,
    otel_traces_endpoint: str | None = None,
    otel_traces_protocol: str = "http/protobuf",
    otel_service_name: str = "vllm-server",
) -> dict[str, Any]:

Change B: Inject OTEL environment variables into the container spec:

if otel_traces_endpoint is not None:
    container["env"].extend([
        {
            "name": "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT",
            "value": otel_traces_endpoint
        },
        {
            "name": "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL",
            "value": otel_traces_protocol
        },
        {
            "name": "OTEL_SERVICE_NAME",
            "value": otel_service_name
        }
    ])

Change C: Add --otlp-traces-endpoint to vLLM serve command arguments:

if otel_traces_endpoint is not None:
    vllm_serve_args.append("--otlp-traces-endpoint")
    vllm_serve_args.append(otel_traces_endpoint)

3. Update Environment Creation Function

File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/create_environment.py

Add OTEL parameters to create_test_environment() function signature and pass them through to ComponentsYaml.deployment_yaml():

def create_test_environment(
    k8s_name: str,
    model: str,
    # ... existing parameters ...
    otel_traces_endpoint: str | None = None,
    otel_traces_protocol: str = "http/protobuf",
    otel_service_name: str = "vllm-server",
    check_interval: int = 5,
    timeout: int = 1200,
) -> None:
    # ...
    deployment_yaml = ComponentsYaml.deployment_yaml(
        # ... existing parameters ...
        otel_traces_endpoint=otel_traces_endpoint,
        otel_traces_protocol=otel_traces_protocol,
        otel_service_name=otel_service_name,
    )

4. Update Experiment Executor

File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/experiment_executor.py

Pass OTEL parameters from actuator configuration to environment creation:

create_test_environment(
    k8s_name=env.k8s_name,
    model=values.get("model"),
    # ... existing parameters ...
    otel_traces_endpoint=actuator.otel_traces_endpoint,
    otel_traces_protocol=actuator.otel_traces_protocol,
    otel_service_name=actuator.otel_service_name,
)

5. Update Documentation

Add example configuration to actuator configuration YAML files:

actuatorIdentifier: vllm_performance
metadata:
  name: my-vllm-actuator
parameters:
  namespace: my-namespace
  in_cluster: true
  max_environments: 3
  
  # OpenTelemetry Configuration (optional)
  # Uncomment to enable distributed tracing
  # otel_traces_endpoint: "http://jaeger-collector.observability:4318/v1/traces"
  # otel_traces_protocol: "http/protobuf"  # or "grpc"
  # otel_service_name: "vllm-performance-server"

Expected Behavior

When configured, the actuator will:

  1. Inject OTEL environment variables into vLLM deployment pods
  2. Pass --otlp-traces-endpoint to the vLLM serve command
  3. Enable vLLM to export traces to the configured OTLP endpoint
  4. Allow users to visualize request processing pipelines in their observability backend

Testing Plan

  1. Deploy Jaeger or another OTLP collector in the Kubernetes cluster
  2. Create an actuator configuration with OTEL parameters set
  3. Run a vLLM performance experiment
  4. Verify traces appear in the observability backend
  5. Confirm trace data includes pre-processing, inference, and post-processing spans
  6. Validate that existing configurations without OTEL parameters continue to work unchanged

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions