Skip to content

[Feat][Router] Migrate custom Chat Completions structs to official SDK types#1550

Merged
Xunzhuo merged 6 commits intovllm-project:mainfrom
asaadbalum:issue-1517-stabilize-openai-anthropic-sdk
Apr 3, 2026
Merged

[Feat][Router] Migrate custom Chat Completions structs to official SDK types#1550
Xunzhuo merged 6 commits intovllm-project:mainfrom
asaadbalum:issue-1517-stabilize-openai-anthropic-sdk

Conversation

@asaadbalum
Copy link
Copy Markdown
Collaborator

@asaadbalum asaadbalum commented Mar 13, 2026

Summary

Stabilizes OpenAI/Anthropic protocol compatibility by replacing 21 internal ad-hoc JSON structs with official openai-go SDK types across 4 packages, as requested in #1517.

  • pkg/responseapi: Replace 8 custom Chat Completions structs with openai-go SDK types. Handle multimodal content (text + image) via SDK union types. Preserve ToolChoice forwarding through SDK type mapping. Add warnings for silently dropped image content and malformed history items.
  • pkg/classification: Replace 6 custom structs. Use composition for vLLM-specific extra_body extension.
  • pkg/modelselection: Replace 3 custom structs. Use composition for provider error field. Fix API error detection: separate error envelope check from SDK deserialization (ChatCompletion.UnmarshalJSON ignores extra fields on embedded structs).
  • pkg/mcp: Replace 4 custom OpenAI tool structs with SDK types for tool conversion functions.
  • pkg/extproc: Update req_filter_response_api.go to consume openai.ChatCompletion instead of custom type.
  • pkg/anthropic: Add 23 compatibility tests validating OpenAI ↔ Anthropic request/response conversion.
  • docs/agent: Add "API Type Contracts" architectural guardrail enforcing SDK-first type usage. Track deferred pkg/cache migration as tech debt.

Testing

Layer Count Packages
Unit + wire-format tests 67 responseapi (26), anthropic (23), extproc (10), mcp (8)
Round-trip mock backend 7 responseapi (mock HTTP server, wire-format compliance)
Race detector PASS All 6 affected packages
CI-equivalent lint PASS golangci-lint 0 issues in changed files
go vet PASS Zero issues
go mod tidy PASS Clean
Agent validation PASS Manifests validated
E2E production stack PASS Full Envoy→ext_proc→Ollama stack deployed, verified with curl + Go SDK deserialization

Follow-up

pkg/cache/ChatMessage uses a serialized format for Redis cache entries. Migrating it to SDK types would be a breaking change requiring cache format migration (existing cache entries would become unreadable). This should be handled as a separate PR with its own rollout plan.

Test plan

  • go build compiles all changed packages
  • go test -race passes for all 6 affected packages (responseapi, anthropic, mcp, extproc, classification, modelselection)
  • golangci-lint reports 0 issues in changed files (--new-from-rev)
  • go vet ./... zero issues
  • go mod tidy clean
  • Agent structure validation passes (make agent-validate)
  • Full production stack (Envoy + ext_proc + Ollama) deployed and verified via curl and Go SDK type deserialization

Closes #1517

Signed-off-by: asaadbalum asaad.balum@gmail.com

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 13, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 73eba69
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69ce85f4d146030007215f70
😎 Deploy Preview https://deploy-preview-1550--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@asaadbalum asaadbalum force-pushed the issue-1517-stabilize-openai-anthropic-sdk branch 2 times, most recently from 7be4cfc to f665e2a Compare March 13, 2026 13:44
@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Mar 17, 2026

Sorry about this. #1553 introduced a large recent refactor, which caused conflicts here. Please rebase onto main and resolve them.

@asaadbalum asaadbalum force-pushed the issue-1517-stabilize-openai-anthropic-sdk branch 2 times, most recently from 91226a2 to a0abe7f Compare March 24, 2026 16:09
@asaadbalum asaadbalum marked this pull request as ready for review March 24, 2026 16:09
@asaadbalum asaadbalum force-pushed the issue-1517-stabilize-openai-anthropic-sdk branch from a0abe7f to ef2b6ba Compare March 24, 2026 16:25
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 docs

Owners: @Xunzhuo
Files changed:

  • docs/agent/architecture-guardrails.md
  • docs/agent/tech-debt/README.md
  • docs/agent/tech-debt/td-038-custom-chat-completions-structs.md

📁 src/semantic-router

Owners: @rootfs, @Xunzhuo, @szedan-rh, @yehuditkerido, @abdallahsamabd, @asaadbalum, @liavweiss, @noalimoy
Files changed:

  • src/semantic-router/pkg/anthropic/compat_test.go
  • src/semantic-router/pkg/classification/vllm_client.go
  • src/semantic-router/pkg/extproc/openai_compat_test.go
  • src/semantic-router/pkg/extproc/req_filter_response_api.go
  • src/semantic-router/pkg/extproc/req_filter_response_api_test.go
  • src/semantic-router/pkg/mcp/types.go
  • src/semantic-router/pkg/mcp/types_test.go
  • src/semantic-router/pkg/modelselection/benchmark_runner.go
  • src/semantic-router/pkg/responseapi/roundtrip_test.go
  • src/semantic-router/pkg/responseapi/translator.go
  • src/semantic-router/pkg/responseapi/translator_test.go

📁 tools

Owners: @Xunzhuo, @yuluo-yx, @samzong
Files changed:

  • tools/agent/repo-manifest.yaml
  • tools/agent/structure-rules.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates OpenAI Chat Completions request/response handling away from internal ad-hoc JSON structs to the official github.com/openai/openai-go SDK types, and adds compatibility tests/guardrails to prevent future schema drift across the router’s OpenAI ↔ Anthropic interoperability surfaces.

Changes:

  • Replace Response API / MCP / classification / modelselection OpenAI-shaped structs with openai-go SDK types (plus minimal composition where router-specific fields are needed).
  • Update extproc Response API translation to parse upstream responses into openai.ChatCompletion.
  • Add wire-format/round-trip tests and document the SDK-first “API Type Contracts” guardrail; track remaining deferred migrations as tech debt.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/agent/structure-rules.yaml Marks modelselection benchmark runner as a legacy hotspot with relaxed function checks.
tools/agent/repo-manifest.yaml Registers TD037 in the manifest and doc governance block.
src/semantic-router/pkg/responseapi/translator.go Replaces custom Chat Completions structs with openai.ChatCompletionNewParams / openai.ChatCompletion and adds multimodal/tool-choice mapping.
src/semantic-router/pkg/responseapi/translator_test.go Adds unit coverage for translation behavior, tool choice mapping, and multimodal content handling.
src/semantic-router/pkg/responseapi/roundtrip_test.go Adds mock-backend and wire-format compliance tests for request/response round-trips.
src/semantic-router/pkg/modelselection/benchmark_runner.go Uses SDK types for request messages and improves provider error detection via an error envelope.
src/semantic-router/pkg/mcp/types.go Migrates MCP ↔ OpenAI tool conversion to SDK tool/tool-call types.
src/semantic-router/pkg/mcp/types_test.go Adds tests ensuring MCP conversion is SDK-wire-compatible and robust to bad arguments.
src/semantic-router/pkg/extproc/req_filter_response_api.go Parses Chat Completions responses into openai.ChatCompletion before translating to Response API.
src/semantic-router/pkg/extproc/openai_compat_test.go Adds regression tests ensuring internal parsing/serialization remains compatible with SDK types.
src/semantic-router/pkg/classification/vllm_client.go Switches classifier vLLM client to SDK response type and SDK message unions in requests.
src/semantic-router/pkg/anthropic/compat_test.go Adds OpenAI ↔ Anthropic conversion compatibility tests validated against official SDK types.
docs/agent/tech-debt/td-037-custom-chat-completions-structs.md Introduces TD037 to track remaining custom Chat Completions type debt (e.g., cache/memory serialization).
docs/agent/tech-debt/README.md Links TD037 in the index and coverage map.
docs/agent/architecture-guardrails.md Adds “API Type Contracts” guardrail enforcing SDK-first type usage.

if len(parts) == 1 && (parts[0].Type == ContentTypeInputText || parts[0].Type == ContentTypeOutputText) {
return parts[0].Text
func convertFunctionTool(fn *FunctionDef) openai.ChatCompletionToolParam {
param := openai.ChatCompletionToolParam{
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai.ChatCompletionToolParam needs its type set (typically "function"). convertFunctionTool currently constructs a tool param with only Function populated, which can serialize with an empty/missing type and cause OpenAI-compatible backends to reject/ignore the tool. Set Type: "function" when building the SDK tool param.

Suggested change
param := openai.ChatCompletionToolParam{
param := openai.ChatCompletionToolParam{
Type: openai.ChatCompletionToolType("function"),

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false positive.
The openai-go SDK's ChatCompletionToolParam.Type field is constant.Function (a ~string type with a custom MarshalJSON). The SDK explicitly documents: "This field can be elided, and will marshal its zero value as 'function'." The SDK's own tests construct ChatCompletionToolParam without setting Type. The zero value serializes correctly to {"type":"function",...}. I've added a wire-format assertion in types_test.go to verify this behavior.

Function: OpenAIToolFunction{
// ConvertToolToOpenAI converts an MCP tool to the official OpenAI SDK tool param type.
func ConvertToolToOpenAI(tool mcp.Tool) openai.ChatCompletionToolParam {
param := openai.ChatCompletionToolParam{
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConvertToolToOpenAI returns an openai.ChatCompletionToolParam without setting Type. Elsewhere in the repo, function tools always set Type: "function"; leaving it empty produces invalid/unsupported tool JSON for OpenAI-compatible servers. Initialize the returned tool param with Type: "function".

Suggested change
param := openai.ChatCompletionToolParam{
param := openai.ChatCompletionToolParam{
Type: "function",

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment's reply


## Status

Partially resolved — major packages migrated, remaining surfaces tracked below.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the tech dept be closed fully?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining items (pkg/cache/ChatMessage and pkg/memory/Message) involve serialized formats used for cache entries. Migrating them to SDK types would be a breaking change: existing cached entries would become unreadable.
This is better handled as a follow-up PR with its own rollout plan, just like suggested in the 'Follow-Up" section in the Description above.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, can you track the follow-up with an issue?

Copy link
Copy Markdown
Collaborator Author

@asaadbalum asaadbalum Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xunzhuo created #1685 to track the pkg/cache and pkg/memory migration.

@asaadbalum asaadbalum force-pushed the issue-1517-stabilize-openai-anthropic-sdk branch from ef2b6ba to d0428d4 Compare March 25, 2026 09:59
@asaadbalum asaadbalum requested a review from Xunzhuo March 25, 2026 10:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

✅ Supply Chain Security Report — All Clear

Scanner Status Findings
AST Codebase Scan (Py, Go, JS/TS, Rust) 30 finding(s) — MEDIUM: 23 · LOW: 7
AST PR Diff Scan No issues detected
Regex Fallback Scan No issues detected

Scanned at 2026-04-02T15:09:53.685Z · View full workflow logs

@rootfs rootfs requested a review from Copilot March 26, 2026 16:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Comment on lines +87 to 91
if req.Stream {
completionReq.StreamOptions = openai.ChatCompletionStreamOptionsParam{
IncludeUsage: openai.Bool(true),
}
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TranslateToCompletionRequest no longer sets the actual stream request flag; it only sets StreamOptions. With openai-go, most backends will not stream unless stream itself is enabled, so this likely breaks streaming behavior compared to the previous completionReq.Stream = req.Stream. Set the SDK stream flag (per the SDK’s field type) in addition to StreamOptions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed. ChatCompletionNewParams does not have a Stream field (it uses request options internally). We now inject stream:true into the translated JSON via sjson.SetBytes in req_filter_response_api.go. Two tests added.

Comment on lines +109 to +110
return openai.ChatCompletionToolChoiceOptionUnionParam{
OfAuto: openai.String(tc),
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string tool_choice mapping always populates OfAuto, even for "none" and "required". If the SDK marshaler interprets OfAuto as the literal "auto" variant (ignoring the string value), this will emit an incorrect wire value and change tool-selection semantics. Map known strings to their corresponding union variants (e.g., auto/none/required), and treat unknown strings as empty/invalid.

Suggested change
return openai.ChatCompletionToolChoiceOptionUnionParam{
OfAuto: openai.String(tc),
switch tc {
case "auto":
return openai.ChatCompletionToolChoiceOptionUnionParam{
OfAuto: openai.String(tc),
}
case "none":
return openai.ChatCompletionToolChoiceOptionUnionParam{
OfNone: openai.String(tc),
}
case "required":
return openai.ChatCompletionToolChoiceOptionUnionParam{
OfRequired: openai.String(tc),
}
default:
// Unknown string values are treated as invalid/empty.
return openai.ChatCompletionToolChoiceOptionUnionParam{}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive. In openai-go v1.12.0, ChatCompletionToolChoiceOptionUnionParam only has OfAuto (param.Opt[string]) and OfChatCompletionNamedToolChoice — there is no OfNone or OfRequired (the suggested code would not compile). OfAuto is a generic string wrapper; OfAuto with value "none" marshals to "none". Verified by TestConvertToolChoice_StringNone and TestConvertToolChoice_StringRequired.

Comment on lines +355 to 361
func convertFunctionTool(fn *FunctionDef) openai.ChatCompletionToolParam {
param := openai.ChatCompletionToolParam{
Function: openai.FunctionDefinitionParam{
Name: fn.Name,
Description: openai.String(fn.Description),
},
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai.ChatCompletionToolParam typically requires the "type": "function" discriminator in the JSON payload for OpenAI-compatible backends. This constructor doesn’t set the tool type explicitly, which can cause the marshaled request to omit type and be rejected. Set the SDK tool type field to the function discriminator when building the param.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive. ChatCompletionToolParam.Type is constant.Function, whose MarshalJSON calls marshalString which substitutes Default() -> "function" for the zero value. The library's own tests do not set this field. Wire-format verified by TestConvertToolToOpenAI_RoundTrip_JSON.

Comment on lines +401 to 407
func ConvertToolToOpenAI(tool mcp.Tool) openai.ChatCompletionToolParam {
param := openai.ChatCompletionToolParam{
Function: openai.FunctionDefinitionParam{
Name: tool.Name,
Description: tool.Description,
Description: openai.String(tool.Description),
},
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the responseapi translator, this conversion doesn’t set the tool type discriminator (usually "function"). Even though tests currently expect "type": "function", relying on implicit SDK behavior is brittle and can break when the SDK changes. Set the tool type explicitly to ensure stable wire-format output.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — constant.Function auto-marshals its zero value to "function". Verified by TestConvertToolToOpenAI_RoundTrip_JSON which asserts parsed["type"] == "function".

}

return msg, nil
return messageForRole(role, ""), nil
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-history inputs, malformed content JSON now gets silently translated into an empty message (no error). This can cause hard-to-debug data loss (the request “succeeds” but with blank user content). Consider returning an error when current-request InputItem.Content fails to parse as either a string or content-parts array; keep the “warn-and-skip” behavior only for history items where best-effort recovery is intended.

Suggested change
return messageForRole(role, ""), nil
return nil, fmt.Errorf("responseapi: failed to parse InputItem.Content as string or content-parts JSON")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing behavior, not a regression. The PR preserved existing semantics. Improving error handling for malformed content is a valid follow-up but outside the scope of this type-migration PR.

Comment on lines +168 to +171
usage := &Usage{
InputTokens: int(resp.Usage.PromptTokens),
OutputTokens: int(resp.Usage.CompletionTokens),
TotalTokens: int(resp.Usage.TotalTokens),
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always returns a non-nil Usage even when the upstream response omitted the usage field (SDK struct would stay zero-valued). If the Response API previously used nil to mean “usage not provided,” this is a behavior change and may be misleading operationally (looks like real 0-token usage). Consider only setting Usage when at least one token count is non-zero, or detect presence of the usage key during unmarshal (e.g., via a lightweight envelope struct / map[string]json.RawMessage) to preserve the distinction.

Suggested change
usage := &Usage{
InputTokens: int(resp.Usage.PromptTokens),
OutputTokens: int(resp.Usage.CompletionTokens),
TotalTokens: int(resp.Usage.TotalTokens),
var usage *Usage
if resp.Usage.PromptTokens != 0 || resp.Usage.CompletionTokens != 0 || resp.Usage.TotalTokens != 0 {
usage = &Usage{
InputTokens: int(resp.Usage.PromptTokens),
OutputTokens: int(resp.Usage.CompletionTokens),
TotalTokens: int(resp.Usage.TotalTokens),
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent with the library — ChatCompletion.Usage is a struct (not pointer), always populated. The old custom type had the same behavior. Changing this would be a behavioral change outside the scope of this migration.

@asaadbalum asaadbalum force-pushed the issue-1517-stabilize-openai-anthropic-sdk branch from 4a2ca13 to d838634 Compare March 31, 2026 21:32
@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Apr 2, 2026

can you fix the CI? others LGTM

Replaces custom Chat Completions structs with the upstream Go SDK types
across the ext_proc pipeline, Response API translator, MCP bridge, and
model selection benchmark runner. Ensures wire-format compatibility with
upstream protocol specifications.

Key changes:
- Use SDK ChatCompletionNewParams / ChatCompletion types throughout
- Inject stream flag via sjson since the SDK manages it internally
- Convert tool_choice using SDK union types
- Add wire-format round-trip tests for protocol compatibility
- Extract helper functions to reduce cyclomatic/cognitive complexity

Closes: vllm-project#1517

Signed-off-by: Alon Baluom <alon.baluom@gmail.com>
Signed-off-by: asaadbalum <asaad.balum@gmail.com>
@asaadbalum
Copy link
Copy Markdown
Collaborator Author

asaadbalum commented Apr 2, 2026

can you fix the CI? others LGTM

Fixed 3 CI violation, the last CI failure is from pre-existing golangci-lint violations on main, exposed because the new lint pipeline flags all issues in touched files, not just new ones. PR #1701 adds the exclusion rules. Once merged, updating this branch should make the check pass.

@Xunzhuo Xunzhuo merged commit e3a5a66 into vllm-project:main Apr 3, 2026
33 checks passed
@asaadbalum asaadbalum deleted the issue-1517-stabilize-openai-anthropic-sdk branch April 3, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: stabilize OpenAI/Anthropic protocol compatibility and official SDK usage