Skip to content

Preserve hidden state fields in chat completions#679

Open
satojandro wants to merge 4 commits into
nearai:mainfrom
satojandro:main
Open

Preserve hidden state fields in chat completions#679
satojandro wants to merge 4 commits into
nearai:mainfrom
satojandro:main

Conversation

@satojandro
Copy link
Copy Markdown

Summary

Adds support for hidden-state passthrough in chat completions.

  • Adds explicit return_hidden_states and layers request fields to ChatCompletionParams
  • Adds flattened extra catch-all fields to ChatCompletionChunk and ChatCompletionResponse
  • Preserves provider-specific response fields such as hidden_states during deserialize/reserialize
  • Updates affected struct literals with default extra values
  • Adds regression coverage for unknown field round-tripping on streamed chunks and full responses

Testing

  • cargo fmt --check
  • cargo test -p inference_providers
  • cargo check
  • git diff --check

This mirrors the existing ChatDelta catch-all behavior so unknown upstream fields are not silently dropped.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces return_hidden_states and layers fields to ChatCompletionParams to support requesting per-layer hidden state activations from backends like sglang. It also adds an extra map to ChatCompletionChunk and ChatCompletionResponse to preserve provider-specific fields during serialization and deserialization. The review feedback correctly identifies that the newly added fields are hardcoded to None during request conversions and service mappings, which prevents them from being populated from the incoming request's extra map. Actionable code suggestions are provided to extract these fields dynamically.

Comment thread crates/api/src/conversions.rs Outdated
Comment thread crates/services/src/completions/mod.rs Outdated
Comment thread crates/services/src/completions/mod.rs Outdated
@satojandro satojandro marked this pull request as draft May 26, 2026 20:43
satojandro and others added 3 commits May 26, 2026 17:47
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@satojandro satojandro marked this pull request as ready for review May 26, 2026 20:48
@Evrard-Nil Evrard-Nil requested a review from Copilot May 27, 2026 09:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the chat-completions normalization layer to support hidden-state passthrough end-to-end, ensuring provider-specific response fields (e.g., hidden_states) are preserved when cloud-api deserializes and reserializes both full responses and streaming chunks.

Changes:

  • Added return_hidden_states and layers to ChatCompletionParams to explicitly request hidden states from backends that support it.
  • Added flattened extra maps to ChatCompletionChunk and ChatCompletionResponse to round-trip unknown/provider-specific response fields.
  • Updated struct literals and added regression tests to ensure unknown fields survive deserialize/reserialize for both streaming and non-streaming responses.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/services/src/inference_provider_pool/mod.rs Updates test params to include new ChatCompletionParams fields.
crates/services/src/completions/mod.rs Extracts return_hidden_states / layers from request extra when building provider params.
crates/inference_providers/tests/integration_tests.rs Updates integration tests to populate new request fields.
crates/inference_providers/src/vllm/mod.rs Updates vLLM tests to include new request fields.
crates/inference_providers/src/models.rs Adds new request fields, adds extra passthrough on chunk/response, and adds round-trip regression tests; fixes doctest import path.
crates/inference_providers/src/mock.rs Updates mock response builders to initialize new extra field.
crates/inference_providers/src/external/openai_compatible.rs Updates tests to include new request fields.
crates/inference_providers/src/external/gemini/mod.rs Ensures synthesized responses initialize new extra field.
crates/inference_providers/src/external/anthropic/mod.rs Ensures synthesized responses initialize new extra field; updates tests to include new request fields.
crates/inference_providers/src/chunk_builder.rs Ensures built chunks initialize new extra field.
crates/api/src/routes/completions.rs Ensures SSE flush chunks initialize new extra field.
crates/api/src/conversions.rs Extracts return_hidden_states / layers from API request extra when converting to provider params.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1141 to +1142
return_hidden_states: extra.remove("return_hidden_states").and_then(|v| v.as_bool()),
layers: extra.remove("layers").and_then(|v| serde_json::from_value(v).ok()),
Comment on lines +1299 to +1300
return_hidden_states: extra.remove("return_hidden_states").and_then(|v| v.as_bool()),
layers: extra.remove("layers").and_then(|v| serde_json::from_value(v).ok()),
Comment on lines +117 to +118
return_hidden_states: extra.remove("return_hidden_states").and_then(|v| v.as_bool()),
layers: extra.remove("layers").and_then(|v| serde_json::from_value(v).ok()),
@Evrard-Nil
Copy link
Copy Markdown
Collaborator

Thanks for the PR, no backend has --enable-return-hidden-states currently so this would be a no-op. May I ask what's your use case?

@satojandro
Copy link
Copy Markdown
Author

satojandro commented May 27, 2026

Hey mate, thanks for the prompt review and reply. I'm doing mechanistic interpretability research that requires per-layer activation from transformer models. gland and vLLM both support returning hidden stated natively, so the main gap is the proxy layer stripping them them before they reach the client(?)

I understand that this may be currently a no-op, but this was actually suggested to be by Illia, so hope we can find a way to enable it. Happy to have a chat on telegram or call @Evrard-Nil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants