Conversation
|
@guygir also note that DCO check is failing and should be fixed |
|
This PR should mimic as closely as possible the work done in this kv-cache-manager one: llm-d/llm-d-kv-cache#92 This includes:
/hold |
- Add local clones of gateway-api-inference-extension and llm-d-kv-cache-manager - Update go.mod with replace directives for local dependencies - Modify Dockerfile to include Python 3.12 runtime and dependencies - Add chat completions preprocessing to precise prefix cache scorer - Add chat completions preprocessing to PD profile handler - Update main.go to register custom plugins - Add comprehensive README with build instructions and troubleshooting This enables KV-cache aware routing for chat completion requests by converting them to flattened templated prompts before performing cache similarity matching. Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: guygir <guygir@gmail.com>
Signed-off-by: guygir <guygir@gmail.com>
Signed-off-by: guygir <guygir@gmail.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.35.5 to 1.35.7. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.35.5...v1.35.7) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.35.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* added comment to stale issues to make sure author doesn't miss the closing of the issue Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Update .github/workflows/stale.yaml Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* fixed missing dependencies in makefile Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * fixed comment in Makefile Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* Update RBAC for latest IGW Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * No longer create an InferenceModel object Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…#345) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Bumps [actions/stale](https://github.com/actions/stale) from 9 to 10. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](actions/stale@v9...v10) --- updated-dependencies: - dependency-name: actions/stale dependency-version: '10' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 5 to 6. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-go dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.35.7 to 1.36.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.35.7...v1.36.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.36.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…llm-d#358) Signed-off-by: Kay Yan <kay.yan@daocloud.io> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
…lm-d#372) Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.36.2 to 1.38.1. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.36.2...v1.38.1) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.38.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix multi-architecture image issues with Kind Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Review fixes Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…heduler repo (llm-d#379) * Moved prefill header definition to common import Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Moved Routing Sidecar into this repo Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Moved Routing Sidecar tests into this repo Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Moved Routing Sidecar Dockerfile into this repo Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added Routing Sidecar to Makefile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added Routing Sidecar to CI stream Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed lint error Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Review fixes and added version info Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Test Nixl V2 instead of the deleted Nixl V1 Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed lint errors Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
…letions support Removed local repository clones - the upstream v0.3.2 already includes chat completions preprocessing functionality. Simplified Dockerfile to download Python requirements from upstream repository instead of copying local files. This makes the build process cleaner and aligns with upstream practices while maintaining the chat completions feature. Signed-off-by: Guy Girmonsky <guygir@gmail.com>
- Updated to show merged state with upstream (31 commits integrated) - Removed outdated local repository clone information - Documented use of upstream llm-d-kv-cache-manager v0.3.2 - Updated troubleshooting section to reflect simplified build process - Clarified that Docker build is recommended for local development - Updated API changes (request.Body vs request.Data) - Documented proper dependency versions Signed-off-by: Guy Girmonsky <guygir@gmail.com>
The upstream v1.1.0 API changed Content from string to struct with Raw field. Fixed both precise_prefix_cache.go and pd_profile_handler.go to use msg.Content.Raw. Also removed unused prompt variable in pd_profile_handler.go Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Python packages (torch, transformers, etc) are already installed in builder stage and copied to runtime. Removed the redundant pip install that was downloading 175MB torch package twice. Also removed python3.12-pip from runtime since we don't need pip in production image. Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Explains that 404MB image had NO chat completions, 233MB adds ALL necessary Python deps for chat preprocessing, and breaks down exactly which packages add how much size. Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: Guy Girmonsky <guygir@gmail.com>
* Make sure that max_completion_tokens=1 in Prefill Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Remove/undo setting of max_completion_tokens to 1, for decode Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
… from PR branch Signed-off-by: Guy Girmonsky <guygir@gmail.com>
…ories, add Python wrapper script - Restore all Dockerfile comments from OG Dockerfile - Remove empty placeholder directories gateway-api-inference-extension and llm-d-kv-cache-manager - Add scripts/fetch-python-wrapper.sh for reusable Python wrapper fetching - Finalize precise_prefix_cache.go changes (logging and comment cleanup) Signed-off-by: Guy Girmonsky <guygir@gmail.com>
e509328 to
c9dfe7d
Compare
|
Updated the PR, and all previous comments have been addressed. Note on DCO: Two commits in the branch (ec6d849 by learner0810, c40cc15 by Morgan Foster) are currently failing DCO checks. These are upstream commits that came into this branch by merging upstream, and they have the same signoff issues in upstream/main itself (missing/incorrect signoffs). Guidance on how to handle these upstream commits - should they be addressed in upstream/main, or is there another approach that I should take in my branch? |
|
@guygir would you kindly
|
|
closing this in favor of clean (rebased) PR |
render_jinja_template_wrapper.py./usr/local/lib/python3.12/site-packages/.--no-cache-dirto keep image size smaller;torchfiltered out manually for now.PYTHONPATH,HF_HOME).RenderJinjaTemplateRequestfrom messages (usesmsg.Content.Raw), initializes processor, fetches model chat template, renders, returns flattened prompt to KV-cache scoring.PREPROCESSING:INFO/DEBUG logs; will be trimmed post-WIP.llm-d-kv-cache-managerandgateway-api-inference-extension.