Skip to content

Conversation

@nrghosh
Copy link
Contributor

@nrghosh nrghosh commented Nov 21, 2025

Description

Exposes logprobs support in Ray Data LLM by extracting logprobs and prompt_logprobs from vLLM outputs and including them in the output rows.

Changes

  • Add logprobs and prompt_logprobs fields to vLLMOutputData model
  • Extract logprobs from output.outputs[0].logprobs in from_vllm_engine_output
  • Extract prompt_logprobs from output.prompt_logprobs in from_vllm_engine_output
  • Convert vLLM's Logprob dataclass instances to serializable dicts using dataclasses.asdict()

Testing

Added unit tests verifying:

  • Extraction of logprobs for generated tokens (including multiple logprobs per token)
  • Extraction of prompt_logprobs (including None entries)
  • Proper handling when logprobs are not requested

Related issues

Closes #58894

Additional information

Extract and surface logprobs from vLLM outputs. Previously, logprobs
could be requested in sampling_params but were not returned in output
rows. This adds logprobs and prompt_logprobs fields to vLLMOutputData
and extracts them from vLLM's RequestOutput.

Signed-off-by: Nikhil Ghosh <[email protected]>
Add unit tests for logprobs and prompt_logprobs extraction from vLLM
outputs, including cases with multiple logprobs per token and None
values.

Signed-off-by: Nikhil Ghosh <[email protected]>
@nrghosh nrghosh added the go add ONLY when ready to merge, run all tests label Nov 21, 2025
@nrghosh nrghosh marked this pull request as ready for review November 21, 2025 22:00
@nrghosh nrghosh requested a review from a team as a code owner November 21, 2025 22:00
@nrghosh
Copy link
Contributor Author

nrghosh commented Nov 21, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively exposes logprobs and prompt_logprobs from vLLM outputs in Ray Data LLM. The modifications to the vLLMOutputData model and its from_vllm_engine_output factory method are clear and correct. The new unit tests validate the functionality, and I've provided suggestions to enhance them by making assertions more comprehensive. This will improve test robustness and maintainability by verifying the entire data structure.

Signed-off-by: Nikhil Ghosh <[email protected]>
if output.outputs[0].logprobs is not None:
data.logprobs = [
{
token_id: dataclasses.asdict(logprob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the possible types for logprob object? Can it be pydantic as well as dataclass (think about the future changes that could happen and the diff between sglang and vllm). I am afraid using dataclasses.asdict() might overfit to todays version

Comment on lines +135 to +143
data.prompt_logprobs = [
{
token_id: dataclasses.asdict(logprob)
for token_id, logprob in logprob_dict.items()
}
if logprob_dict is not None
else None
for logprob_dict in output.prompt_logprobs
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this part a utility and reuse between prompt_logprobs and logprobs


logprobs = [
{
123: Logprob(logprob=-0.5, rank=1, decoded_token="hello"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, what is rank here? rank of the TP workers? or some other rank?

wrapper.shutdown()


def test_vllm_output_data_logprobs():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, this test does not test any code path in ray data llm stuff, it's only testing the data type logic in vllm. Is that intentional?

111: {"logprob": -0.1, "rank": 1, "decoded_token": "test"},
222: {"logprob": -0.8, "rank": 2, "decoded_token": "demo"},
},
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this test.

@ray-gardener ray-gardener bot added data Ray Data-related issues llm labels Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[data][llm] Expose logprobs support in Ray Data LLM

2 participants