[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

noooop · 2025-12-15T07:19:36Z

Improve all pooling task

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524
[Model] Add num_cached_tokens for PoolingRequestOutput #27378
[Model] Allow users to control skip reading cache per request. #28194
Improve enable chunked_prefill & prefix_caching logic. #26623
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API #26686
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. #30249
[Deprecation] Deprecation --convert reward, use --convert embed instead. #30463
[Model] Automatic conversion of TokenClassification model #30666
[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

Generate runner supports using embed and token_embed tasks & End the Improve all pooling tasks series

FIX #11905
FIX #24288
FIX #6165
FIX #4435

Test Plan

tests/models/language/pooling/test_extract_hidden_states.py

Test Result

pass

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist

Code Review

This pull request updates a test for pooling tasks to align with recent improvements, including support for returning prompt hidden states during generation. The test now uses the 'generate' runner and includes a new case to verify text generation with prefix caching.

My main feedback is that the new test case, while intended to verify the return of prompt hidden states, lacks assertions to confirm their presence and correctness. I've suggested adding these assertions to ensure the feature is properly tested.

tests/models/language/pooling/test_extract_hidden_states.py

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-12-15T09:30:05Z

@DarkLight1337

Are we planning to implement generate runner support using embed and token_embed tasks?

With this very very dirty fix can make the tests/models/language/pooling/test_extract_hidden_states.py test pass, but we can't batch the generated request and pooling request together.

Signed-off-by: wang.yuqi <[email protected]>

DarkLight1337 · 2025-12-15T09:34:48Z

but we can't batch the generated request and pooling request together.

I think this limitation is ok if we can alternate between generative and pooling batches

noooop · 2025-12-15T09:39:49Z

but we can't batch the generated request and pooling request together.

I think this limitation is ok if we can alternate between generative and pooling batches

I will further refine this PR if it's decided to implement “Generate runner support using embed and token_embed tasks”

noooop · 2025-12-15T09:43:20Z

hello @breakices

PTAL: #24288 (comment)

token_embed can be used to extract Prompt Hidden States

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API #26686

Could using "generate runner support with embed and token_embed tasks" as a form of "Returning Prompt Hidden States" help with RLVR?

noooop · 2025-12-15T09:52:05Z

hello @charlotte12l

A long time has passed since my last comment #24288 (comment), and I'm finally about to implement it.

Could using "generate runner support with embed and token_embed tasks" as a form of "Returning Prompt Hidden States" help with your use case.

init

0346166

Signed-off-by: wang.yuqi <[email protected]>

noooop force-pushed the Prompt_Hidden_States branch from 096de98 to 0346166 Compare December 15, 2025 07:20

noooop changed the title ~~[Model][Last/N] Improve all pooling task | Support Returning Prompt Hidden States~~ [Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. Dec 15, 2025

Merge branch 'main' into Prompt_Hidden_States

7873b25

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

tests/models/language/pooling/test_extract_hidden_states.py Show resolved Hide resolved

mergify bot added frontend v1 labels Dec 15, 2025

noooop force-pushed the Prompt_Hidden_States branch from 47058b2 to 63038d5 Compare December 15, 2025 09:24

very dirty fix

64da7a6

Signed-off-by: wang.yuqi <[email protected]>

noooop force-pushed the Prompt_Hidden_States branch from 63038d5 to 64da7a6 Compare December 15, 2025 09:26

noooop added 2 commits December 15, 2025 17:32

fix

4b482ac

Signed-off-by: wang.yuqi <[email protected]>

fix

e3b1dd9

Signed-off-by: wang.yuqi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

noooop commented Dec 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

noooop commented Dec 15, 2025

Uh oh!

DarkLight1337 commented Dec 15, 2025

Uh oh!

noooop commented Dec 15, 2025

Uh oh!

noooop commented Dec 15, 2025 •

edited

Loading

Uh oh!

noooop commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

Are you sure you want to change the base?

[Model][Last/N] Improve all pooling task | Generate runner supports using embed and token_embed tasks. #30672

Conversation

noooop commented Dec 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improve all pooling task

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

noooop commented Dec 15, 2025

Uh oh!

DarkLight1337 commented Dec 15, 2025

Uh oh!

noooop commented Dec 15, 2025

Uh oh!

noooop commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noooop commented Dec 15, 2025 •

edited by github-actions bot

Loading

noooop commented Dec 15, 2025 •

edited

Loading