Is your feature request related to a problem? Please describe.
When running vLLM performance experiments with the vllm_performance actuator, we currently lack visibility into the internal request processing pipeline. Specifically, we cannot observe:
- Pre-processing latency - Time spent preparing requests before inference
- Post-processing latency - Time spent formatting and returning results after inference
- Inter-processing times - Gaps and queuing delays between processing stages
- Request arrival patterns - Temporal distribution and rate of incoming requests
vLLM already supports OpenTelemetry (OTEL) tracing via the --otlp-traces-endpoint flag and OTEL environment variables, but the vllm_performance actuator does not expose this functionality to users.
Describe the solution you'd like
Add OpenTelemetry tracing configuration support to the vllm_performance actuator, allowing users to collect distributed traces from vLLM deployments and send them to observability backends like Jaeger, Tempo, or any OTLP-compatible collector.
Implementation Plan
1. Add OTEL Configuration Parameters
File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/actuator_parameters.py
Add three new optional parameters to VLLMPerformanceTestParameters:
otel_traces_endpoint: Annotated[
str | None,
pydantic.Field(
description="OpenTelemetry traces endpoint URL (e.g., http://jaeger-collector:4318/v1/traces). If set, enables OTLP tracing in vLLM."
),
] = None
otel_traces_protocol: Annotated[
str,
pydantic.Field(
description="OpenTelemetry traces protocol (http/protobuf or grpc)"
),
] = "http/protobuf"
otel_service_name: Annotated[
str,
pydantic.Field(
description="Service name for OpenTelemetry traces"
),
] = "vllm-server"
2. Update Deployment YAML Generation
File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/yaml_support/build_components.py
Change A: Add OTEL parameters to deployment_yaml() function signature:
def deployment_yaml(
k8s_name: str,
model: str,
# ... existing parameters ...
io_processor_plugin: str | None = None,
otel_traces_endpoint: str | None = None,
otel_traces_protocol: str = "http/protobuf",
otel_service_name: str = "vllm-server",
) -> dict[str, Any]:
Change B: Inject OTEL environment variables into the container spec:
if otel_traces_endpoint is not None:
container["env"].extend([
{
"name": "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT",
"value": otel_traces_endpoint
},
{
"name": "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL",
"value": otel_traces_protocol
},
{
"name": "OTEL_SERVICE_NAME",
"value": otel_service_name
}
])
Change C: Add --otlp-traces-endpoint to vLLM serve command arguments:
if otel_traces_endpoint is not None:
vllm_serve_args.append("--otlp-traces-endpoint")
vllm_serve_args.append(otel_traces_endpoint)
3. Update Environment Creation Function
File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/create_environment.py
Add OTEL parameters to create_test_environment() function signature and pass them through to ComponentsYaml.deployment_yaml():
def create_test_environment(
k8s_name: str,
model: str,
# ... existing parameters ...
otel_traces_endpoint: str | None = None,
otel_traces_protocol: str = "http/protobuf",
otel_service_name: str = "vllm-server",
check_interval: int = 5,
timeout: int = 1200,
) -> None:
# ...
deployment_yaml = ComponentsYaml.deployment_yaml(
# ... existing parameters ...
otel_traces_endpoint=otel_traces_endpoint,
otel_traces_protocol=otel_traces_protocol,
otel_service_name=otel_service_name,
)
4. Update Experiment Executor
File: plugins/actuators/vllm_performance/ado_actuators/vllm_performance/experiment_executor.py
Pass OTEL parameters from actuator configuration to environment creation:
create_test_environment(
k8s_name=env.k8s_name,
model=values.get("model"),
# ... existing parameters ...
otel_traces_endpoint=actuator.otel_traces_endpoint,
otel_traces_protocol=actuator.otel_traces_protocol,
otel_service_name=actuator.otel_service_name,
)
5. Update Documentation
Add example configuration to actuator configuration YAML files:
actuatorIdentifier: vllm_performance
metadata:
name: my-vllm-actuator
parameters:
namespace: my-namespace
in_cluster: true
max_environments: 3
# OpenTelemetry Configuration (optional)
# Uncomment to enable distributed tracing
# otel_traces_endpoint: "http://jaeger-collector.observability:4318/v1/traces"
# otel_traces_protocol: "http/protobuf" # or "grpc"
# otel_service_name: "vllm-performance-server"
Expected Behavior
When configured, the actuator will:
- Inject OTEL environment variables into vLLM deployment pods
- Pass
--otlp-traces-endpoint to the vLLM serve command
- Enable vLLM to export traces to the configured OTLP endpoint
- Allow users to visualize request processing pipelines in their observability backend
Testing Plan
- Deploy Jaeger or another OTLP collector in the Kubernetes cluster
- Create an actuator configuration with OTEL parameters set
- Run a vLLM performance experiment
- Verify traces appear in the observability backend
- Confirm trace data includes pre-processing, inference, and post-processing spans
- Validate that existing configurations without OTEL parameters continue to work unchanged
Is your feature request related to a problem? Please describe.
When running vLLM performance experiments with the
vllm_performanceactuator, we currently lack visibility into the internal request processing pipeline. Specifically, we cannot observe:vLLM already supports OpenTelemetry (OTEL) tracing via the
--otlp-traces-endpointflag and OTEL environment variables, but thevllm_performanceactuator does not expose this functionality to users.Describe the solution you'd like
Add OpenTelemetry tracing configuration support to the
vllm_performanceactuator, allowing users to collect distributed traces from vLLM deployments and send them to observability backends like Jaeger, Tempo, or any OTLP-compatible collector.Implementation Plan
1. Add OTEL Configuration Parameters
File:
plugins/actuators/vllm_performance/ado_actuators/vllm_performance/actuator_parameters.pyAdd three new optional parameters to
VLLMPerformanceTestParameters:2. Update Deployment YAML Generation
File:
plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/yaml_support/build_components.pyChange A: Add OTEL parameters to
deployment_yaml()function signature:Change B: Inject OTEL environment variables into the container spec:
Change C: Add
--otlp-traces-endpointto vLLM serve command arguments:3. Update Environment Creation Function
File:
plugins/actuators/vllm_performance/ado_actuators/vllm_performance/k8s/create_environment.pyAdd OTEL parameters to
create_test_environment()function signature and pass them through toComponentsYaml.deployment_yaml():4. Update Experiment Executor
File:
plugins/actuators/vllm_performance/ado_actuators/vllm_performance/experiment_executor.pyPass OTEL parameters from actuator configuration to environment creation:
5. Update Documentation
Add example configuration to actuator configuration YAML files:
Expected Behavior
When configured, the actuator will:
--otlp-traces-endpointto the vLLM serve commandTesting Plan