feat: Add OpenTelemetry tracing support to vllm_performance actuator



**Is your feature request related to a problem? Please describe.**

When running vLLM performance experiments with the `vllm_performance` actuator, we currently lack visibility into the internal request processing pipeline. Specifically, we cannot observe:

- **Pre-processing latency** - Time spent preparing requests before inference
- **Post-processing latency** - Time spent formatting and returning results after inference
- **Inter-processing times** - Gaps and queuing delays between processing stages
- **Request arrival patterns** - Temporal distribution and rate of incoming requests

vLLM already supports OpenTelemetry (OTEL) tracing via the `--otlp-traces-endpoint` flag and OTEL environment variables, but the `vllm_performance` actuator does not expose this functionality to users.

**Describe the solution you'd like**

Add OpenTelemetry tracing configuration support to the `vllm_performance` actuator, allowing users to collect distributed traces from vLLM deployments and send them to observability backends like Jaeger, Tempo, or any OTLP-compatible collector.

## Implementation Plan

### 1. Add OTEL Configuration Parameters

**File**: `plugins/actuators/vllm_performance/ado_actuators/vllm_performance/actuator_parameters.py`

Add three new optional parameters to `VLLMPerformanceTestParameters`:

```python
otel_traces_endpoint: Annotated[
    str | None,
    pydantic.Field(
        description="OpenTelemetry traces endpoint URL (e.g., http://jaeger-collector:4318/v1/traces). If set, enables OTLP tracing in vLLM."
    ),
] = None

otel_traces_protocol: Annotated[
    str,
    pydantic.Field(
        description="OpenTelemetry traces protocol (http/protobuf or grpc)"
    ),
] = "http/protobuf"

otel_service_name: Annotated[
    str,
    pydantic.Field(
        description="Service name for OpenTelemetry traces"
    ),
] = "vllm-server"
```

### 2. Update Deployment YAML Generation

**File**: `plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/yaml_support/build_components.py`

**Change A**: Add OTEL parameters to `deployment_yaml()` function signature:

```python
def deployment_yaml(
    k8s_name: str,
    model: str,
    # ... existing parameters ...
    io_processor_plugin: str | None = None,
    otel_traces_endpoint: str | None = None,
    otel_traces_protocol: str = "http/protobuf",
    otel_service_name: str = "vllm-server",
) -> dict[str, Any]:
```

**Change B**: Inject OTEL environment variables into the container spec:

```python
if otel_traces_endpoint is not None:
    container["env"].extend([
        {
            "name": "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT",
            "value": otel_traces_endpoint
        },
        {
            "name": "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL",
            "value": otel_traces_protocol
        },
        {
            "name": "OTEL_SERVICE_NAME",
            "value": otel_service_name
        }
    ])
```

**Change C**: Add `--otlp-traces-endpoint` to vLLM serve command arguments:

```python
if otel_traces_endpoint is not None:
    vllm_serve_args.append("--otlp-traces-endpoint")
    vllm_serve_args.append(otel_traces_endpoint)
```

### 3. Update Environment Creation Function

**File**: `plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/create_environment.py`

Add OTEL parameters to `create_test_environment()` function signature and pass them through to `ComponentsYaml.deployment_yaml()`:

```python
def create_test_environment(
    k8s_name: str,
    model: str,
    # ... existing parameters ...
    otel_traces_endpoint: str | None = None,
    otel_traces_protocol: str = "http/protobuf",
    otel_service_name: str = "vllm-server",
    check_interval: int = 5,
    timeout: int = 1200,
) -> None:
    # ...
    deployment_yaml = ComponentsYaml.deployment_yaml(
        # ... existing parameters ...
        otel_traces_endpoint=otel_traces_endpoint,
        otel_traces_protocol=otel_traces_protocol,
        otel_service_name=otel_service_name,
    )
```

### 4. Update Experiment Executor

**File**: `plugins/actuators/vllm_performance/ado_actuators/vllm_performance/experiment_executor.py`

Pass OTEL parameters from actuator configuration to environment creation:

```python
create_test_environment(
    k8s_name=env.k8s_name,
    model=values.get("model"),
    # ... existing parameters ...
    otel_traces_endpoint=actuator.otel_traces_endpoint,
    otel_traces_protocol=actuator.otel_traces_protocol,
    otel_service_name=actuator.otel_service_name,
)
```

### 5. Update Documentation

Add example configuration to actuator configuration YAML files:

```yaml
actuatorIdentifier: vllm_performance
metadata:
  name: my-vllm-actuator
parameters:
  namespace: my-namespace
  in_cluster: true
  max_environments: 3
  
  # OpenTelemetry Configuration (optional)
  # Uncomment to enable distributed tracing
  # otel_traces_endpoint: "http://jaeger-collector.observability:4318/v1/traces"
  # otel_traces_protocol: "http/protobuf"  # or "grpc"
  # otel_service_name: "vllm-performance-server"
```

## Expected Behavior

When configured, the actuator will:

1. Inject OTEL environment variables into vLLM deployment pods
2. Pass `--otlp-traces-endpoint` to the vLLM serve command
3. Enable vLLM to export traces to the configured OTLP endpoint
4. Allow users to visualize request processing pipelines in their observability backend

## Testing Plan

1. Deploy Jaeger or another OTLP collector in the Kubernetes cluster
2. Create an actuator configuration with OTEL parameters set
3. Run a vLLM performance experiment
4. Verify traces appear in the observability backend
5. Confirm trace data includes pre-processing, inference, and post-processing spans
6. Validate that existing configurations without OTEL parameters continue to work unchanged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add OpenTelemetry tracing support to vllm_performance actuator #816

Implementation Plan

1. Add OTEL Configuration Parameters

2. Update Deployment YAML Generation

3. Update Environment Creation Function

4. Update Experiment Executor

5. Update Documentation

Expected Behavior

Testing Plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Add OpenTelemetry tracing support to vllm_performance actuator #816

Description

Implementation Plan

1. Add OTEL Configuration Parameters

2. Update Deployment YAML Generation

3. Update Environment Creation Function

4. Update Experiment Executor

5. Update Documentation

Expected Behavior

Testing Plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions