[data][llm] Expose logprobs support in Ray Data LLM #58899

nrghosh · 2025-11-21T20:18:30Z

Description

Exposes logprobs support in Ray Data LLM by extracting logprobs and prompt_logprobs from vLLM outputs and including them in the output rows.

Changes

Add logprobs and prompt_logprobs fields to vLLMOutputData model
Extract logprobs from output.outputs[0].logprobs in from_vllm_engine_output
Extract prompt_logprobs from output.prompt_logprobs in from_vllm_engine_output
Convert vLLM's Logprob dataclass instances to serializable dicts using dataclasses.asdict()

Testing

Added unit tests verifying:

Extraction of logprobs for generated tokens (including multiple logprobs per token)
Extraction of prompt_logprobs (including None entries)
Proper handling when logprobs are not requested

Related issues

Closes #58894

Additional information

Extract and surface logprobs from vLLM outputs. Previously, logprobs could be requested in sampling_params but were not returned in output rows. This adds logprobs and prompt_logprobs fields to vLLMOutputData and extracts them from vLLM's RequestOutput. Signed-off-by: Nikhil Ghosh <[email protected]>

Add unit tests for logprobs and prompt_logprobs extraction from vLLM outputs, including cases with multiple logprobs per token and None values. Signed-off-by: Nikhil Ghosh <[email protected]>

nrghosh · 2025-11-21T22:00:35Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively exposes logprobs and prompt_logprobs from vLLM outputs in Ray Data LLM. The modifications to the vLLMOutputData model and its from_vllm_engine_output factory method are clear and correct. The new unit tests validate the functionality, and I've provided suggestions to enhance them by making assertions more comprehensive. This will improve test robustness and maintainability by verifying the entire data structure.

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py

Signed-off-by: Nikhil Ghosh <[email protected]>

kouroshHakha · 2025-11-21T22:09:38Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

+            if output.outputs[0].logprobs is not None:
+                data.logprobs = [
+                    {
+                        token_id: dataclasses.asdict(logprob)


What are the possible types for logprob object? Can it be pydantic as well as dataclass (think about the future changes that could happen and the diff between sglang and vllm). I am afraid using dataclasses.asdict() might overfit to todays version

kouroshHakha · 2025-11-21T22:10:10Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

+                data.prompt_logprobs = [
+                    {
+                        token_id: dataclasses.asdict(logprob)
+                        for token_id, logprob in logprob_dict.items()
+                    }
+                    if logprob_dict is not None
+                    else None
+                    for logprob_dict in output.prompt_logprobs
+                ]


make this part a utility and reuse between prompt_logprobs and logprobs

kouroshHakha · 2025-11-21T22:10:50Z

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py

+
+    logprobs = [
+        {
+            123: Logprob(logprob=-0.5, rank=1, decoded_token="hello"),


question, what is rank here? rank of the TP workers? or some other rank?

kouroshHakha · 2025-11-21T22:12:18Z

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py

    wrapper.shutdown()


+def test_vllm_output_data_logprobs():


if I understand correctly, this test does not test any code path in ray data llm stuff, it's only testing the data type logic in vllm. Is that intentional?

kouroshHakha · 2025-11-21T22:12:29Z

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py

+            111: {"logprob": -0.1, "rank": 1, "decoded_token": "test"},
+            222: {"logprob": -0.8, "rank": 2, "decoded_token": "demo"},
+        },
+    ]


same for this test.

nrghosh added 2 commits November 21, 2025 12:13

wip - add tests for logprobs extraction

211c2b0

Add unit tests for logprobs and prompt_logprobs extraction from vLLM outputs, including cases with multiple logprobs per token and None values. Signed-off-by: Nikhil Ghosh <[email protected]>

nrghosh added the go add ONLY when ready to merge, run all tests label Nov 21, 2025

nrghosh marked this pull request as ready for review November 21, 2025 22:00

nrghosh requested a review from a team as a code owner November 21, 2025 22:00

gemini-code-assist bot reviewed Nov 21, 2025

View reviewed changes

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py Outdated Show resolved Hide resolved

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py Outdated Show resolved Hide resolved

wip - cleanup tests

2cae6ee

Signed-off-by: Nikhil Ghosh <[email protected]>

kouroshHakha reviewed Nov 21, 2025

View reviewed changes

ray-gardener bot added data Ray Data-related issues llm labels Nov 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data][llm] Expose logprobs support in Ray Data LLM #58899

[data][llm] Expose logprobs support in Ray Data LLM #58899

nrghosh commented Nov 21, 2025

Uh oh!

nrghosh commented Nov 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

kouroshHakha Nov 21, 2025

Uh oh!

kouroshHakha Nov 21, 2025

Uh oh!

kouroshHakha Nov 21, 2025

Uh oh!

kouroshHakha Nov 21, 2025

Uh oh!

kouroshHakha Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[data][llm] Expose logprobs support in Ray Data LLM #58899

Are you sure you want to change the base?

[data][llm] Expose logprobs support in Ray Data LLM #58899

Conversation

nrghosh commented Nov 21, 2025

Description

Changes

Testing

Related issues

Additional information

Uh oh!

nrghosh commented Nov 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

kouroshHakha Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants