Skip to content

feat(tokenization): replace RenderChat with RenderChatCompletion RPC#432

Merged
vMaroon merged 29 commits intollm-d:mainfrom
sagearc:integrate-renderer-service
Mar 28, 2026
Merged

feat(tokenization): replace RenderChat with RenderChatCompletion RPC#432
vMaroon merged 29 commits intollm-d:mainfrom
sagearc:integrate-renderer-service

Conversation

@sagearc
Copy link
Copy Markdown
Collaborator

@sagearc sagearc commented Mar 17, 2026

Closes #425. Rebased on top of #461.

Changes Overview

Switches Render and RenderChat in UdsTokenizer to use the new RenderCompletion and RenderChatCompletion RPCs introduced in #461, replacing the old RenderChatTemplate flow.

On the Go side, RenderChat now builds a native RenderChatCompletionRequest proto (messages, tools, chat_template_kwargs) and returns token IDs directly instead of calling Encode on a rendered prompt string. Render calls RenderCompletion with the prompt list and returns token IDs directly too — neither returns character offsets anymore since the renderer service doesn't produce them.

Protocol Design

Tools and chat_template_kwargs are both serialized as JSON strings in the proto (tools_json, chat_template_kwargs). This avoids building a typed proto structure for fields that are already arbitrary JSON at the call site ([]interface{} in GIE's ChatCompletionsRequest), and lets Python deserialize them directly without field renaming or special-casing.

On the Python side, the gRPC servicer and renderer are updated to match: the renderer service methods now accept typed ChatCompletionRequest/CompletionRequest objects directly instead of going through a JSON round-trip. Proto-to-request conversion uses MessageToDict and json.loads for the JSON string fields.

Also excludes generated pb.go and pb2 files from golangci-lint and ruff in CI.

@github-actions
Copy link
Copy Markdown

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@sagearc sagearc force-pushed the integrate-renderer-service branch 2 times, most recently from abcec3d to 49f885d Compare March 25, 2026 11:42
@sagearc sagearc marked this pull request as ready for review March 25, 2026 11:56
Copilot AI review requested due to automatic review settings March 25, 2026 11:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the KV-cache manager’s UDS tokenizer client to use the newer vLLM renderer RPCs (RenderChatCompletion / RenderCompletion) instead of the legacy chat-template rendering flow, and adjusts tests/protobuf bindings accordingly.

Changes:

  • Switch Go UDS tokenizer client Render to call RenderCompletion and RenderChat to call RenderChatCompletion.
  • Update Go and Python tests to reflect the new RPCs and response shapes (notably: offsets are no longer asserted).
  • Regenerate Go protobuf/grpc bindings to include the new RPCs and message types.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/e2e/uds_tokenizer/uds_e2e_test.go Updates e2e assertions to ignore offsets and validate determinism under the new render RPCs.
services/uds_tokenizer/tests/test_renderer.py Adjusts integration tests for RenderChatCompletion; one assertion was weakened.
pkg/tokenization/uds_tokenizer_test.go Updates mock server + unit tests to cover RenderChatCompletion / RenderCompletion.
pkg/tokenization/uds_tokenizer.go Main client change: builds OpenAI-ish JSON payloads and calls new renderer RPCs.
api/tokenizerpb/tokenizer_grpc.pb.go Regenerated gRPC client/server stubs with new RPC methods.
api/tokenizerpb/tokenizer.pb.go Regenerated protobuf messages for new render request/response + MM feature types.
api/indexerpb/indexer_grpc.pb.go Regenerated header/version metadata.
api/indexerpb/indexer.pb.go Regenerated header/version metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sagearc sagearc marked this pull request as draft March 25, 2026 12:19
@sagearc sagearc closed this Mar 25, 2026
@sagearc sagearc force-pushed the integrate-renderer-service branch from 6c6e2c9 to 665ec28 Compare March 25, 2026 14:46
sagearc added 2 commits March 25, 2026 16:48
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc reopened this Mar 25, 2026
@sagearc sagearc force-pushed the integrate-renderer-service branch from 0e2f31b to 2e3be89 Compare March 25, 2026 14:50
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc force-pushed the integrate-renderer-service branch from 2e3be89 to fc368a3 Compare March 25, 2026 14:51
sagearc added 5 commits March 25, 2026 16:54
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…uest json

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
sagearc added 12 commits March 25, 2026 18:20
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc marked this pull request as ready for review March 25, 2026 17:51
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc
Copy link
Copy Markdown
Collaborator Author

sagearc commented Mar 25, 2026

@vMaroon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

),
self.renderer_service.render_chat(chat_request, request.model_name),
self._loop,
).result()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yours, but I think we need a timeout in order not to block the whole grpc server.

from concurrent.futures import TimeoutError as FuturesTimeoutError

try:
    result = asyncio.run_coroutine_threadsafe(
        self.renderer_service.render_chat(chat_request, request.model_name),
        self._loop,
    ).result(timeout=30)
except FuturesTimeoutError:
    context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, "render_chat timed out")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already migrating to async here, would it make more sense to add it there?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow-up separately if needed.

sagearc added 3 commits March 28, 2026 14:05
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc self-assigned this Mar 28, 2026
@vMaroon
Copy link
Copy Markdown
Member

vMaroon commented Mar 28, 2026

/lgtm
/approve

@github-actions github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Mar 28, 2026
@vMaroon vMaroon merged commit c66fd5c into llm-d:main Mar 28, 2026
11 checks passed
@sagearc sagearc deleted the integrate-renderer-service branch March 28, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Looks good to me, indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MM] update sidecar to use OpenAIServingRender and parse GenerateRequest

4 participants