- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.9k
 
Description
What happened + What you expected to happen
Background
While vLLM provides Prometheus and Grafana integration, and Ray Serve has LLM observability support, Ray Data LLM does not have out-of-the-box vLLM metrics integration.
Key Differences:
- Ray Serve: Has a dedicated 
LLMConfigwithlog_engine_metrics=Trueoption - Ray Data LLM: No such configuration option exists, requiring manual integration
 
Problem
By default, vLLM metrics are not automatically exported via Ray's metrics system when using Ray Data LLM integration. The "Serve LLM Dashboard" in Grafana shows no metrics for Ray Data LLM deployments, despite vLLM having Ray metrics wrappers available.
Solution
The solution involves leveraging vLLM v1's Ray metrics wrappers (vllm.v1.metrics.ray_wrappers) which forward vLLM counters/histograms into ray.util.metrics. This approach has been confirmed by the Ray team (Kourosh Hakhamaneshi) in slack thread as the correct way to integrate vLLM metrics with Ray Data LLM.
1. Enable Ray Metrics in vLLM Engine
Since Ray Data LLM doesn't expose the stat_loggers parameter (unlike Ray Serve's LLMConfig), you need to modify the engine initialization. In the Ray Data LLM engine initialization, import and use RayPrometheusStatLogger:
# In ray/llm/_internal/batch/stages/vllm_engine_stage.py
# In vLLMEngineWrapper.__init__ method:
self._vllm_config = engine_args.create_engine_config()
# add the below block
if not engine_args.disable_log_stats:
    from vllm.v1.metrics.ray_wrappers import RayPrometheusStatLogger
    self.engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=[RayPrometheusStatLogger])
else:
    self.engine = vllm.AsyncLLMEngine.from_engine_args(engine_args)Important: The vllm.AsyncLLMEngine accepts a stat_loggers parameter, but this is not exposed in the Ray Data LLM APIs, requiring this manual modification.
3. Configure Ray Data LLM Pipeline
In your pipeline configuration, ensure metrics are enabled:
processor_config = vLLMEngineProcessorConfig(
    model_source=self.config.model_name,
    engine_kwargs={
        # ... other settings ...
        "disable_log_stats": False,  # Enable vLLM stats logging
    },
    runtime_env={
        "env_vars": {
            "VLLM_USE_V1": "1",  # Use vLLM v1 which has Ray metrics support
            # ... other env vars ...
        },
    },
)4. Initialize Ray with Metrics Export
ray.init(
    _metrics_export_port=8080,  # Prometheus metrics endpoint
    include_dashboard=True,
    dashboard_host="0.0.0.0"
)5. Create Ray Data LLM Dashboard
Note: The RayPrometheusStatLogger sanitizes the the opentelemetry name in the vllm.v1.metrics.ray_wrappers.
So all the messages are emitted with prefix ray_vllm and import it as Ray Data LLM dashboard.
You can make a copy of the Ray Serve LLM dashboard and replace all the prom ql vllm: with ray_vllm.
Once configured, metrics mentioned in vLLM v1 metrics are available via Ray.
I have added my dashboard as reference ray_data_llm_dashboard.json.
After these changes I am able to see the vLLM dashboard in Ray Data as well.

I have a working solution in https://github.com/anindya-saha/ray-summit-2025/blob/main/vllm_ray_metrics_integration.md.
Versions / Dependencies
dependencies = [
    "ray[default,data]==2.50.1",
    "transformers[torch]==4.57.1",
    "vllm==0.11.0",
]Reproduction script
You can use the script from https://github.com/anindya-saha/ray-summit-2025/blob/main/src/01_image_caption_demo.py to reproduce.
Issue Severity
Medium: It is a significant difficulty but I can work around it.