Skip to content

[Data][LLM] Ray Data LLM does not have out-of-the-box vLLM metrics integration #58360

@anindya-saha

Description

@anindya-saha

What happened + What you expected to happen

Background

While vLLM provides Prometheus and Grafana integration, and Ray Serve has LLM observability support, Ray Data LLM does not have out-of-the-box vLLM metrics integration.

Key Differences:

  • Ray Serve: Has a dedicated LLMConfig with log_engine_metrics=True option
  • Ray Data LLM: No such configuration option exists, requiring manual integration

Problem

By default, vLLM metrics are not automatically exported via Ray's metrics system when using Ray Data LLM integration. The "Serve LLM Dashboard" in Grafana shows no metrics for Ray Data LLM deployments, despite vLLM having Ray metrics wrappers available.

Solution

The solution involves leveraging vLLM v1's Ray metrics wrappers (vllm.v1.metrics.ray_wrappers) which forward vLLM counters/histograms into ray.util.metrics. This approach has been confirmed by the Ray team (Kourosh Hakhamaneshi) in slack thread as the correct way to integrate vLLM metrics with Ray Data LLM.

1. Enable Ray Metrics in vLLM Engine

Since Ray Data LLM doesn't expose the stat_loggers parameter (unlike Ray Serve's LLMConfig), you need to modify the engine initialization. In the Ray Data LLM engine initialization, import and use RayPrometheusStatLogger:

# In ray/llm/_internal/batch/stages/vllm_engine_stage.py

# In vLLMEngineWrapper.__init__ method:
self._vllm_config = engine_args.create_engine_config()

# add the below block
if not engine_args.disable_log_stats:
    from vllm.v1.metrics.ray_wrappers import RayPrometheusStatLogger
    self.engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=[RayPrometheusStatLogger])
else:
    self.engine = vllm.AsyncLLMEngine.from_engine_args(engine_args)

Important: The vllm.AsyncLLMEngine accepts a stat_loggers parameter, but this is not exposed in the Ray Data LLM APIs, requiring this manual modification.

3. Configure Ray Data LLM Pipeline

In your pipeline configuration, ensure metrics are enabled:

processor_config = vLLMEngineProcessorConfig(
    model_source=self.config.model_name,
    engine_kwargs={
        # ... other settings ...
        "disable_log_stats": False,  # Enable vLLM stats logging
    },
    runtime_env={
        "env_vars": {
            "VLLM_USE_V1": "1",  # Use vLLM v1 which has Ray metrics support
            # ... other env vars ...
        },
    },
)

4. Initialize Ray with Metrics Export

ray.init(
    _metrics_export_port=8080,  # Prometheus metrics endpoint
    include_dashboard=True,
    dashboard_host="0.0.0.0"
)

5. Create Ray Data LLM Dashboard

Note: The RayPrometheusStatLogger sanitizes the the opentelemetry name in the vllm.v1.metrics.ray_wrappers.
So all the messages are emitted with prefix ray_vllm and import it as Ray Data LLM dashboard.

You can make a copy of the Ray Serve LLM dashboard and replace all the prom ql vllm: with ray_vllm.

Once configured, metrics mentioned in vLLM v1 metrics are available via Ray.

I have added my dashboard as reference ray_data_llm_dashboard.json.

After these changes I am able to see the vLLM dashboard in Ray Data as well.
Image

I have a working solution in https://github.com/anindya-saha/ray-summit-2025/blob/main/vllm_ray_metrics_integration.md.

Versions / Dependencies

dependencies = [
    "ray[default,data]==2.50.1",
    "transformers[torch]==4.57.1",
    "vllm==0.11.0",
]

Reproduction script

You can use the script from https://github.com/anindya-saha/ray-summit-2025/blob/main/src/01_image_caption_demo.py to reproduce.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesllmobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingtriageNeeds triage (eg: priority, bug/not-bug, and owning component)usability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions