[FIX] fix wrong indexing for hidden states when prefix-cache take effect #6551

Dutch-voyage · 2025-05-23T08:10:07Z

Motivation

return_hidden_states=True does not return prefilled hidden_states as expected.
see relevant issue #4997

Modifications

In scheduler_output_processor_mixin.py:116
change

req.hidden_states.append(
    logits_output.hidden_states[
        hidden_state_offset : (
            hidden_state_offset := hidden_state_offset
            + len(req.origin_input_ids)
        )
    ]
    .cpu()
    .clone()
    .tolist()
)

to

req.hidden_states.append(
    logits_output.hidden_states[
        hidden_state_offset : (
            hidden_state_offset := hidden_state_offset
            + req.extend_input_len
        )
    ]
    .cpu()
    .clone()
    .tolist()
)

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

xiangchensong · 2025-07-08T19:22:44Z

The proposed modification may still face the same issue. After patching the code, if you try to pass the same query multiple times, the hidden states shape is still not correct.

Try the following code and send the same request multiple times, and the prefilled hidden_states shape will become [1, llm_dim].

python -m sglang.launch_server \
  --model-path Qwen/QwQ-32B \
  --enable-return-hidden-states \
  --host 0.0.0.0 \
  --port 30000 \
  --tp 4

PORT = 30000
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {
    "temperature": 0.6,
    "top_p": 0.95,
    "max_new_tokens": 10,
}

json_data = {
    "text": prompts,
    "sampling_params": sampling_params,
    "return_hidden_states": True,
}
for _ in range(3):
    print("Sending request to the server...")
    response = requests.post(
        f"http://localhost:{PORT}/generate",
        json=json_data,
    )
    outputs = response.json()
    for prompt, output in zip(prompts, outputs):
        for i in range(len(output["meta_info"]["hidden_states"])):
            output["meta_info"]["hidden_states"][i] = torch.tensor(
                output["meta_info"]["hidden_states"][i]
            )
        print("===============================")
        print(
            f"Prompt: {prompt}\n"
            f"Prompt_Tokens: {output['meta_info']['prompt_tokens']}\t"
            f"Completion_tokens: {output['meta_info']['completion_tokens']}"
        )
        shapes = [torch.tensor(i.shape).tolist() for i in output["meta_info"]["hidden_states"]]
        print(f"Hidden States Shape: {shapes}")

fix wrong hidden_state index

19dc907

Dutch-voyage requested review from merrymercy, Ying1123, hnyls2002 and xiezhq-hermann as code owners May 23, 2025 08:10

Dutch-voyage mentioned this pull request May 23, 2025

[Bug] Hidden states is not correctly acquired for batched processing #4997

Open

5 tasks

Dutch-voyage changed the title ~~[FIX] fix wrong indeing for hidden states when prefix-cache take effect~~ [FIX] fix wrong indexing for hidden states when prefix-cache take effect May 23, 2025

Dutch-voyage requested review from zhyncs, ispobock, HaiShaw, ch-wan, BBuf, zhaochenyang20 and ByronHsu as code owners May 29, 2025 12:42

Dutch-voyage closed this May 29, 2025

Dutch-voyage force-pushed the main branch from 4c3d4bd to dcae1fb Compare May 29, 2025 12:47

Dutch-voyage reopened this May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] fix wrong indexing for hidden states when prefix-cache take effect #6551

[FIX] fix wrong indexing for hidden states when prefix-cache take effect #6551

Uh oh!

Dutch-voyage commented May 23, 2025

Uh oh!

xiangchensong commented Jul 8, 2025

Uh oh!

Uh oh!

[FIX] fix wrong indexing for hidden states when prefix-cache take effect #6551

Are you sure you want to change the base?

[FIX] fix wrong indexing for hidden states when prefix-cache take effect #6551

Uh oh!

Conversation

Dutch-voyage commented May 23, 2025

Motivation

Modifications

Checklist

Uh oh!

xiangchensong commented Jul 8, 2025

Uh oh!

Uh oh!