feat(tokenization): replace RenderChat with RenderChatCompletion RPC by sagearc · Pull Request #432 · llm-d/llm-d-kv-cache

sagearc · 2026-03-17T14:56:23Z

Closes #425. Rebased on top of #461.

Changes Overview

Switches Render and RenderChat in UdsTokenizer to use the new RenderCompletion and RenderChatCompletion RPCs introduced in #461, replacing the old RenderChatTemplate flow.

On the Go side, RenderChat now builds a native RenderChatCompletionRequest proto (messages, tools, chat_template_kwargs) and returns token IDs directly instead of calling Encode on a rendered prompt string. Render calls RenderCompletion with the prompt list and returns token IDs directly too — neither returns character offsets anymore since the renderer service doesn't produce them.

Protocol Design

Tools and chat_template_kwargs are both serialized as JSON strings in the proto (tools_json, chat_template_kwargs). This avoids building a typed proto structure for fields that are already arbitrary JSON at the call site ([]interface{} in GIE's ChatCompletionsRequest), and lets Python deserialize them directly without field renaming or special-casing.

On the Python side, the gRPC servicer and renderer are updated to match: the renderer service methods now accept typed ChatCompletionRequest/CompletionRequest objects directly instead of going through a JSON round-trip. Proto-to-request conversion uses MessageToDict and json.loads for the JSON string fields.

Also excludes generated pb.go and pb2 files from golangci-lint and ruff in CI.

github-actions · 2026-03-23T15:32:50Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Copilot

Pull request overview

This PR updates the KV-cache manager’s UDS tokenizer client to use the newer vLLM renderer RPCs (RenderChatCompletion / RenderCompletion) instead of the legacy chat-template rendering flow, and adjusts tests/protobuf bindings accordingly.

Changes:

Switch Go UDS tokenizer client Render to call RenderCompletion and RenderChat to call RenderChatCompletion.
Update Go and Python tests to reflect the new RPCs and response shapes (notably: offsets are no longer asserted).
Regenerate Go protobuf/grpc bindings to include the new RPCs and message types.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/e2e/uds_tokenizer/uds_e2e_test.go	Updates e2e assertions to ignore offsets and validate determinism under the new render RPCs.
services/uds_tokenizer/tests/test_renderer.py	Adjusts integration tests for `RenderChatCompletion`; one assertion was weakened.
pkg/tokenization/uds_tokenizer_test.go	Updates mock server + unit tests to cover `RenderChatCompletion` / `RenderCompletion`.
pkg/tokenization/uds_tokenizer.go	Main client change: builds OpenAI-ish JSON payloads and calls new renderer RPCs.
api/tokenizerpb/tokenizer_grpc.pb.go	Regenerated gRPC client/server stubs with new RPC methods.
api/tokenizerpb/tokenizer.pb.go	Regenerated protobuf messages for new render request/response + MM feature types.
api/indexerpb/indexer_grpc.pb.go	Regenerated header/version metadata.
api/indexerpb/indexer.pb.go	Regenerated header/version metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/tokenization/uds_tokenizer.go

services/uds_tokenizer/tests/test_renderer.py

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

…uest json Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc · 2026-03-25T17:57:24Z

@vMaroon

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/tokenization/uds_tokenizer.go

api/tokenizerpb/tokenizer.proto

pkg/tokenization/uds_tokenizer.go

services/uds_tokenizer/tokenizer_grpc_service.py

pkg/tokenization/uds_tokenizer_test.go

services/uds_tokenizer/tokenizer_grpc_service.py

pkg/tokenization/uds_tokenizer_test.go

gyliu513 · 2026-03-28T03:11:40Z

services/uds_tokenizer/tokenizer_grpc_service.py

-                ),
+                self.renderer_service.render_chat(chat_request, request.model_name),
                self._loop,
            ).result()


Not yours, but I think we need a timeout in order not to block the whole grpc server.

from concurrent.futures import TimeoutError as FuturesTimeoutError try: result = asyncio.run_coroutine_threadsafe( self.renderer_service.render_chat(chat_request, request.model_name), self._loop, ).result(timeout=30) except FuturesTimeoutError: context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, "render_chat timed out")

Since we're already migrating to async here, would it make more sense to add it there?

Let's follow-up separately if needed.

pkg/tokenization/uds_tokenizer.go

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

vMaroon · 2026-03-28T14:27:10Z

/lgtm
/approve

github-actions bot requested review from hyeongyun0916, liu-cong and yankay March 17, 2026 14:56

sagearc mentioned this pull request Mar 21, 2026

Add RenderChatCompletion and RenderCompletion gRPC RPCs via vLLM OpenAIServingRender #428

Merged

sagearc force-pushed the integrate-renderer-service branch from 74db48b to 5df8fe5 Compare March 23, 2026 15:32

sagearc force-pushed the integrate-renderer-service branch 2 times, most recently from abcec3d to 49f885d Compare March 25, 2026 11:42

sagearc marked this pull request as ready for review March 25, 2026 11:56

sagearc requested review from dannyharnik, delavet, kfirtoledo and vMaroon as code owners March 25, 2026 11:56

Copilot AI review requested due to automatic review settings March 25, 2026 11:56

Copilot started reviewing on behalf of sagearc March 25, 2026 11:57 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

pkg/tokenization/uds_tokenizer.go Outdated Show resolved Hide resolved

pkg/tokenization/uds_tokenizer.go Outdated Show resolved Hide resolved

services/uds_tokenizer/tests/test_renderer.py Outdated Show resolved Hide resolved

sagearc marked this pull request as draft March 25, 2026 12:19

sagearc closed this Mar 25, 2026

sagearc force-pushed the integrate-renderer-service branch from 6c6e2c9 to 665ec28 Compare March 25, 2026 14:46

sagearc added 2 commits March 25, 2026 16:48

render

04bc14d

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

render chat

4450321

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc reopened this Mar 25, 2026

sagearc force-pushed the integrate-renderer-service branch from 0e2f31b to 2e3be89 Compare March 25, 2026 14:50

update test assertions

fc368a3

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc force-pushed the integrate-renderer-service branch from 2e3be89 to fc368a3 Compare March 25, 2026 14:51

sagearc added 5 commits March 25, 2026 16:54

update e2e tests

07603f9

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

update python tests to use protos

cc1edd5

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

python grpc

380f53c

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

update protos

3ca903b

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

update python grpc renderer service with native protos instead of req…

ce67158

…uest json Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc added 12 commits March 25, 2026 18:20

lint

50e59bb

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

ruff ignore grpc files

7ab3210

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

ruff ignore grpc files in ci

223b684

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

align protos

573390c

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

fix getter name

fbbfea6

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

fix tests

7a9d785

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

one prompt per completion

13cc6f2

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

update tests

6854234

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

add_generation_prompt default true

4d27601

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

simplify tools to json

924badf

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

unused import

5d34690

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

empty commit

d7e9023

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc marked this pull request as ready for review March 25, 2026 17:51

empty commit

898e09f

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc requested a review from Copilot March 25, 2026 20:05

Copilot started reviewing on behalf of sagearc March 25, 2026 20:06 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

gyliu513 reviewed Mar 28, 2026

View reviewed changes

sagearc added 3 commits March 28, 2026 14:05

doc fix

4e14d7c

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

cr rename test

f6a411e

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

fix default omitted values

073058a

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

sagearc self-assigned this Mar 28, 2026

github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Mar 28, 2026

github-actions bot approved these changes Mar 28, 2026

View reviewed changes

vMaroon merged commit c66fd5c into llm-d:main Mar 28, 2026
11 checks passed

sagearc deleted the integrate-renderer-service branch March 28, 2026 14:27

vMaroon mentioned this pull request Mar 28, 2026

feat: bump llm-d-kv-cache for MM-aware prefix-cache routing llm-d/llm-d-inference-scheduler#772

Merged

gyliu513 mentioned this pull request Mar 29, 2026

test: Add proto field contract tests to e2e-test-uds #469

Closed

5 tasks

Conversation

sagearc commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sagearc commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gyliu513 Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

sagearc Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

vMaroon Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vMaroon commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sagearc commented Mar 17, 2026 •

edited

Loading