[data][llm] Expose logprobs support in Ray Data LLM

### Description

## Description

Currently, when using `ray.data.llm` with vLLM, users can specify `logprobs=True` (or `logprobs=N`) in the `sampling_params` dictionary, and vLLM successfully processes this parameter. However, the logprobs data is not returned in the final output from Ray Data.

The issue is that while vLLM generates the logprobs data (as evidenced by the `SamplingParams` being correctly parsed), this information is dropped during the conversion from vLLM's `RequestOutput` to Ray's `vLLMOutputData` format in the `from_vllm_engine_output` method.

**Current behavior:**
- `logprobs=True` can be specified in `sampling_params`
- vLLM processes the request with logprobs enabled
- The output shows `SamplingParams(n=1,…,logprobs=1,…)` indicating vLLM received the parameter
- However, the actual logprobs data (probability distributions) is not present in the returned rows

**Expected behavior:**
- When `logprobs` is specified in `sampling_params`, the logprobs data should be included in the output rows
- Users should be able to access logprobs through the postprocessor function

**Technical details:**
The issue is in `python/ray/llm/_internal/batch/stages/vllm_engine_stage.py`:
- The `vLLMOutputData` model (lines 71-88) does not have a field for logprobs
- The `from_vllm_engine_output` method (lines 91-124) extracts `generated_tokens`, `generated_text`, and `metrics` from vLLM's output, but does not extract `logprobs` from `output.outputs[0].logprobs`
- vLLM's `CompletionOutput` (in `output.outputs[0]`) contains a `logprobs: SampleLogprobs | None` field that is currently being ignored
- Additionally, `output.prompt_logprobs` (of type `PromptLogprobs | None`) may also need to be exposed if users request prompt logprobs

**Reproduction:**
```python
from ray.data.llm import build_llm_processor

processor = build_llm_processor(
    config,
    preprocess=lambda row: dict(
        messages=[
            {"role": "system", "content": "You are a bot that responds with haikus."},
            {"role": "user", "content": row["item"]}
        ],
        sampling_params=dict(
            temperature=0.3,
            max_tokens=250,
            logprobs=True  # This is parsed correctly by vLLM
        )
    ),
    postprocess=lambda row: dict(**row)  # logprobs not found in row
)
```


### Use case


## Use case

Users need access to logprobs for downstream tasks like evaluation/analysis, filtering, debugging, research, etc. 

Without access to logprobs, users are unable to perform these analyses even though vLLM supports this feature. This creates a gap between what vLLM can provide and what Ray Data LLM exposes to users.

The fix should be straightforward: extract the logprobs data from vLLM's output object and include it in the `vLLMOutputData` model so it flows through to the final output rows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data][llm] Expose logprobs support in Ray Data LLM #58894

Description

Description

Use case

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[data][llm] Expose logprobs support in Ray Data LLM #58894

Description

Description

Description

Use case

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions