Skip to content

Update precise-prefix-cache-scorer to use tokens-based GetPodScores#577

Open
acardace wants to merge 2 commits intollm-d:mainfrom
acardace:feat/getpodscores-with-token
Open

Update precise-prefix-cache-scorer to use tokens-based GetPodScores#577
acardace wants to merge 2 commits intollm-d:mainfrom
acardace:feat/getpodscores-with-token

Conversation

@acardace
Copy link
Contributor

Summary

Updates the precise-prefix-cache-scorer to perform tokenization in the scheduler and pass pre-computed tokens to GetPodScores, rather than delegating tokenization to the kv-cache indexer.

Related to llm-d/llm-d-kv-cache#244
Related to #530

Note: This PR depends on llm-d/llm-d-kv-cache#266 and must be merged after it.

Changes

  • Build: Update Makefile PYTHONPATH to reference llm-d-kv-cache module
  • Scorer: Tokenize the prompt in the scheduler, then pass tokens to GetPodScores
  • Tests: Adapt to updated signatures and reuse tokenizer's built-in chat templater

@elevran
Copy link
Collaborator

elevran commented Jan 21, 2026

/hold for post 0.5

@github-actions github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 21, 2026
@elevran elevran moved this to In progress in llm-d-inference-scheduler Jan 21, 2026
@acardace
Copy link
Contributor Author

@elevran what's the release cadence for llm-d-kv-cache? Of course the corresponding PR in kv-cache must be merged first and have a tag before merging this.

@elevran
Copy link
Collaborator

elevran commented Jan 21, 2026

I believe it's 6w give or take. @vMaroon can give you a more exact answer. From inference scheduler point of view the hold can be removed as we cut the 0.5 RC in the next few days

@acardace acardace force-pushed the feat/getpodscores-with-token branch 2 times, most recently from 5a30a16 to 2859a30 Compare January 22, 2026 10:17
The new API separates tokenization from scoring, requiring explicit
token processor initialization and a two-step flow: tokenize first,
then get pod scores.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Adapt tests to the new llm-d-kv-cache API

Signed-off-by: Antonio Cardace <acardace@redhat.com>
@acardace acardace force-pushed the feat/getpodscores-with-token branch from 2859a30 to f6a5830 Compare January 22, 2026 11:34
@elevran elevran added this to the v0.6 milestone Jan 22, 2026
@elevran elevran removed the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 26, 2026
@elevran
Copy link
Collaborator

elevran commented Jan 26, 2026

@acardace @vMaroon @kfswain
does it make sense to do toeknziation as part of the scorer or should this be more of an "infra" service (perhaps as part of an explicit data preparation phase)?

@acardace
Copy link
Contributor Author

@acardace @vMaroon @kfswain does it make sense to do toeknziation as part of the scorer or should this be more of an "infra" service (perhaps as part of an explicit data preparation phase)?

My take is that this is just prepping in order to move tokenization as a service, possibly inside GAIE. I'm actually working on a RFC to introduce tokenization as a service inside the IGW.

@github-actions
Copy link

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants