Update precise-prefix-cache-scorer to use tokens-based GetPodScores by acardace · Pull Request #577 · llm-d/llm-d-inference-scheduler

acardace · 2026-01-21T14:07:39Z

Summary

Updates the precise-prefix-cache-scorer to perform tokenization in the scheduler and pass pre-computed tokens to GetPodScores, rather than delegating tokenization to the kv-cache indexer.

Related to llm-d/llm-d-kv-cache#244
Related to #530

Note: This PR depends on llm-d/llm-d-kv-cache#266 and must be merged after it.

Changes

Build: Update Makefile PYTHONPATH to reference llm-d-kv-cache module
Scorer: Tokenize the prompt in the scheduler, then pass tokens to GetPodScores
Tests: Adapt to updated signatures and reuse tokenizer's built-in chat templater

elevran · 2026-01-21T15:21:32Z

/hold for post 0.5

acardace · 2026-01-21T15:27:30Z

@elevran what's the release cadence for llm-d-kv-cache? Of course the corresponding PR in kv-cache must be merged first and have a tag before merging this.

elevran · 2026-01-21T17:07:45Z

I believe it's 6w give or take. @vMaroon can give you a more exact answer. From inference scheduler point of view the hold can be removed as we cut the 0.5 RC in the next few days

The new API separates tokenization from scoring, requiring explicit token processor initialization and a two-step flow: tokenize first, then get pod scores. Signed-off-by: Antonio Cardace <acardace@redhat.com>

Adapt tests to the new llm-d-kv-cache API Signed-off-by: Antonio Cardace <acardace@redhat.com>

elevran · 2026-01-26T11:54:38Z

@acardace @vMaroon @kfswain
does it make sense to do toeknziation as part of the scorer or should this be more of an "infra" service (perhaps as part of an explicit data preparation phase)?

acardace · 2026-01-26T12:46:22Z

@acardace @vMaroon @kfswain does it make sense to do toeknziation as part of the scorer or should this be more of an "infra" service (perhaps as part of an explicit data preparation phase)?

My take is that this is just prepping in order to move tokenization as a service, possibly inside GAIE. I'm actually working on a RFC to introduce tokenization as a service inside the IGW.

github-actions · 2026-02-17T01:40:26Z

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

github-project-automation bot added this to llm-d-inference-scheduler Jan 21, 2026

github-actions bot requested review from kfswain and nilig January 21, 2026 14:07

github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 21, 2026

elevran moved this to In progress in llm-d-inference-scheduler Jan 21, 2026

acardace force-pushed the feat/getpodscores-with-token branch 2 times, most recently from 5a30a16 to 2859a30 Compare January 22, 2026 10:17

acardace added 2 commits January 22, 2026 12:34

Update precise-prefix-cache-scorer to latest llm-d-kv-cache

0aea474

The new API separates tokenization from scoring, requiring explicit token processor initialization and a two-step flow: tokenize first, then get pod scores. Signed-off-by: Antonio Cardace <acardace@redhat.com>

test: update scorer tests for llm-d-kv-cache API

f6a5830

Adapt tests to the new llm-d-kv-cache API Signed-off-by: Antonio Cardace <acardace@redhat.com>

acardace force-pushed the feat/getpodscores-with-token branch from 2859a30 to f6a5830 Compare January 22, 2026 11:34

elevran added this to the v0.6 milestone Jan 22, 2026

elevran removed the hold PRs that are blocked on design, other features, release cycle, etc. label Jan 26, 2026

github-actions bot added the lifecycle/stale label Feb 17, 2026

github-actions bot added lifecycle/rotten and removed lifecycle/stale labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update precise-prefix-cache-scorer to use tokens-based GetPodScores#577

Update precise-prefix-cache-scorer to use tokens-based GetPodScores#577
acardace wants to merge 2 commits intollm-d:mainfrom
acardace:feat/getpodscores-with-token

acardace commented Jan 21, 2026

Uh oh!

elevran commented Jan 21, 2026

Uh oh!

acardace commented Jan 21, 2026

Uh oh!

elevran commented Jan 21, 2026

Uh oh!

elevran commented Jan 26, 2026

Uh oh!

acardace commented Jan 26, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acardace commented Jan 21, 2026

Summary

Changes

Uh oh!

elevran commented Jan 21, 2026

Uh oh!

acardace commented Jan 21, 2026

Uh oh!

elevran commented Jan 21, 2026

Uh oh!

elevran commented Jan 26, 2026

Uh oh!

acardace commented Jan 26, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants