Skip to content

Upgrade github.com/llm-d/llm-d-kv-cache-manager import to v0.3.0#344

Merged
github-actions[bot] merged 1 commit intollm-d:mainfrom
vMaroon:kvc-v0.3
Sep 10, 2025
Merged

Upgrade github.com/llm-d/llm-d-kv-cache-manager import to v0.3.0#344
github-actions[bot] merged 1 commit intollm-d:mainfrom
vMaroon:kvc-v0.3

Conversation

@vMaroon
Copy link
Member

@vMaroon vMaroon commented Sep 10, 2025

Summary

This PR upgrades the github.com/llm-d/llm-d-kv-cache-manager import the the v0.3.0 release.

Note that this does not utilize the new preprocessing package for chat-completions support - this will followup in v0.4.0.

Benchmarking & Tests

  • Overall v0.3.0 includes significant test coverage boosts
  • Benchmarking data on the 37-capacity setup (open link for context)

Summary across QPS

Experiment Output toks/s Requests/s Success Rate TTFT p90 (s) TTFT mean (s) ITL mean (s) ITL p50/ p90 (s)
precise v0.3.0-rc1 5533.7 6.920 100.00% 0.222 0.169 0.019 0.0000/0.094
precise v0.2.1 5650.0 6.914 100.00% 0.275 0.193 0.020 0.0000/0.085

EPP Queue and KV Cache Metrics Summary

Experiment Wait Queue (mean/p90/max) KV Cache % (mean/p90/max) Pods Data Points
precise v0.2.1 0.2/0/11 49.8/70.5/76.9 4 20824
precise v0.3.0-rc1 0.0/0/1 49.0/67.0/83.9 4 52920

Cool graphs

ttft_tpot_throughput_tripanel ttft_p90_vs_qps

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
@nirrozenbaum
Copy link
Collaborator

/lgtm
/approve

@github-actions github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2025
@github-actions github-actions bot merged commit b78eefc into llm-d:main Sep 10, 2025
5 checks passed
nirrozenbaum pushed a commit that referenced this pull request Sep 17, 2025
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants