Update precise-prefix-cache-scorer to use tokens-based GetPodScores#577
Update precise-prefix-cache-scorer to use tokens-based GetPodScores#577acardace wants to merge 2 commits intollm-d:mainfrom
Conversation
|
/hold for post 0.5 |
|
@elevran what's the release cadence for llm-d-kv-cache? Of course the corresponding PR in kv-cache must be merged first and have a tag before merging this. |
|
I believe it's 6w give or take. @vMaroon can give you a more exact answer. From inference scheduler point of view the hold can be removed as we cut the 0.5 RC in the next few days |
5a30a16 to
2859a30
Compare
The new API separates tokenization from scoring, requiring explicit token processor initialization and a two-step flow: tokenize first, then get pod scores. Signed-off-by: Antonio Cardace <acardace@redhat.com>
Adapt tests to the new llm-d-kv-cache API Signed-off-by: Antonio Cardace <acardace@redhat.com>
2859a30 to
f6a5830
Compare
My take is that this is just prepping in order to move tokenization as a service, possibly inside GAIE. I'm actually working on a RFC to introduce tokenization as a service inside the IGW. |
|
This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the |
Summary
Updates the precise-prefix-cache-scorer to perform tokenization in the scheduler and pass pre-computed tokens to
GetPodScores, rather than delegating tokenization to the kv-cache indexer.Related to llm-d/llm-d-kv-cache#244
Related to #530
Note: This PR depends on llm-d/llm-d-kv-cache#266 and must be merged after it.
Changes
llm-d-kv-cachemoduleGetPodScores