[Data][LLM] Ray Data LLM does not have out-of-the-box vLLM metrics integration

### What happened + What you expected to happen

## Background
While vLLM provides [Prometheus and Grafana integration](https://docs.vllm.ai/en/v0.7.2/getting_started/examples/prometheus_grafana.html), and Ray Serve has [LLM observability support](https://docs.ray.io/en/master/serve/llm/user-guides/observability.html), **Ray Data LLM does not have out-of-the-box vLLM metrics integration**.

### Key Differences:
- **Ray Serve**: Has a dedicated `LLMConfig` with `log_engine_metrics=True` option
- **Ray Data LLM**: No such configuration option exists, requiring manual integration

## Problem
By default, vLLM metrics are not automatically exported via Ray's metrics system when using Ray Data LLM integration. The "Serve LLM Dashboard" in Grafana shows no metrics for Ray Data LLM deployments, despite vLLM having Ray metrics wrappers available.

## Solution

The solution involves leveraging vLLM v1's Ray metrics wrappers ([vllm.v1.metrics.ray_wrappers](https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/ray_wrappers.html)) which forward vLLM counters/histograms into `ray.util.metrics`. This approach has been confirmed by the Ray team (Kourosh Hakhamaneshi) in [slack thread](https://ray.slack.com/archives/C08H0M37WLQ/p1761686813471059?thread_ts=1760636442.990289&cid=C08H0M37WLQ) as the correct way to integrate vLLM metrics with Ray Data LLM.

### 1. Enable Ray Metrics in vLLM Engine
Since Ray Data LLM doesn't expose the `stat_loggers` parameter (unlike Ray Serve's `LLMConfig`), you need to modify the engine initialization. In the Ray Data LLM engine initialization, import and use `RayPrometheusStatLogger`:

```python
# In ray/llm/_internal/batch/stages/vllm_engine_stage.py

# In vLLMEngineWrapper.__init__ method:
self._vllm_config = engine_args.create_engine_config()

# add the below block
if not engine_args.disable_log_stats:
    from vllm.v1.metrics.ray_wrappers import RayPrometheusStatLogger
    self.engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=[RayPrometheusStatLogger])
else:
    self.engine = vllm.AsyncLLMEngine.from_engine_args(engine_args)
```

**Important**: The `vllm.AsyncLLMEngine` accepts a `stat_loggers` parameter, but this is not exposed in the Ray Data LLM APIs, requiring this manual modification.

### 3. Configure Ray Data LLM Pipeline
In your pipeline configuration, ensure metrics are enabled:

```python
processor_config = vLLMEngineProcessorConfig(
    model_source=self.config.model_name,
    engine_kwargs={
        # ... other settings ...
        "disable_log_stats": False,  # Enable vLLM stats logging
    },
    runtime_env={
        "env_vars": {
            "VLLM_USE_V1": "1",  # Use vLLM v1 which has Ray metrics support
            # ... other env vars ...
        },
    },
)
```

### 4. Initialize Ray with Metrics Export
```python
ray.init(
    _metrics_export_port=8080,  # Prometheus metrics endpoint
    include_dashboard=True,
    dashboard_host="0.0.0.0"
)
```

### 5. Create Ray Data LLM Dashboard
**Note:** The `RayPrometheusStatLogger` sanitizes the the opentelemetry name in the [vllm.v1.metrics.ray_wrappers](https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/ray_wrappers.html).
So all the messages are emitted with prefix `ray_vllm` and import it as Ray Data LLM dashboard.

You can make a copy of the Ray Serve LLM dashboard and replace all the prom ql `vllm:` with `ray_vllm`.

Once configured, metrics mentioned in [vLLM v1 metrics](https://docs.vllm.ai/en/latest/design/metrics.html#v1-metrics) are available via Ray.

I have added my dashboard as reference [ray_data_llm_dashboard.json](https://github.com/user-attachments/files/23291979/ray_data_llm_dashboard.json).

After these changes I am able to see the vLLM dashboard in Ray Data as well.
<img width="3782" height="1790" alt="Image" src="https://github.com/user-attachments/assets/39f7cb08-98d0-4098-ae69-45a262cfe979" />

I have a working solution in https://github.com/anindya-saha/ray-summit-2025/blob/main/vllm_ray_metrics_integration.md.

### Versions / Dependencies

```bash
dependencies = [
    "ray[default,data]==2.50.1",
    "transformers[torch]==4.57.1",
    "vllm==0.11.0",
]
```

### Reproduction script

You can use the script from https://github.com/anindya-saha/ray-summit-2025/blob/main/src/01_image_caption_demo.py to reproduce.

### Issue Severity

Medium: It is a significant difficulty but I can work around it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data][LLM] Ray Data LLM does not have out-of-the-box vLLM metrics integration #58360

What happened + What you expected to happen

Background

Key Differences:

Problem

Solution

1. Enable Ray Metrics in vLLM Engine

3. Configure Ray Data LLM Pipeline

4. Initialize Ray with Metrics Export

5. Create Ray Data LLM Dashboard

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data][LLM] Ray Data LLM does not have out-of-the-box vLLM metrics integration #58360

Description

What happened + What you expected to happen

Background

Key Differences:

Problem

Solution

1. Enable Ray Metrics in vLLM Engine

3. Configure Ray Data LLM Pipeline

4. Initialize Ray with Metrics Export

5. Create Ray Data LLM Dashboard

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions