-
Notifications
You must be signed in to change notification settings - Fork 205
feat: support Gemini native client endpoints (/v1beta/models/{model}:generateContent) #1960
Description
Summary
Gemini CLI and the Google AI SDK hard-code the native Gemini API path format:
POST /v1beta/models/{model}:generateContent
POST /v1beta/models/{model}:streamGenerateContent
The model name is embedded in the URL, not the request body. The current extproc server uses an exact-path hash map (processorFactories), so these paths fall through with a 404 and cannot be routed.
This issue tracks adding first-class support for Gemini native client endpoints so that tools like Gemini CLI can be pointed at the gateway without modification.
Proposed Changes
1. Prefix-based path dispatch in Server (internal/extproc/server.go)
Add RegisterPrefix(prefix string, factory ProcessorFactory) alongside the existing Register. When an exact match is not found, fall back to a longest-prefix match over all registered prefix factories. This is a general mechanism — not Gemini-specific.
2. NewGeminiProcessorFactory (internal/extproc/processor_impl.go)
A ProcessorFactory that:
- Parses the model name and streaming flag from the URL path (
extractGeminiModelFromPath) - Passes them as a pre-populated
GenerateContentEndpointSpectonewRouterProcessor(router branch) ornewUpstreamProcessor(upstream branch) - Uses
NoopTracersince Gemini passthrough does not need tracing
3. GenerateContentEndpointSpec (internal/endpointspec/gemini_generatecontent.go)
Endpoint spec carrying ModelFromPath string and Streaming bool. The existing GeminiToGCPVertexAI translator reads these to build the correct Vertex AI path suffix.
4. GeminiToGCPVertexAITranslator (internal/translator/gemini_gcpvertexai.go)
A near-passthrough translator: rewrites :path to the full Vertex AI generateContent / streamGenerateContent URL. Also strips FunctionResponse.ID from all parts before forwarding — the Google AI SDK (used by Gemini CLI in gemini-api auth mode) populates this field, but Vertex AI /v1 rejects it as an unknown field. Vertex AI matches function_response to function_call by name, not id, so stripping is safe.
5. EndpointPrefixes.Gemini (internal/internalapi/internalapi.go)
Adds a gemini key to ParseEndpointPrefixes, defaulting to /v1beta, so operators can remap the prefix if needed.
6. Wire-up in cmd/extproc/mainlib/main.go
geminiModelPrefix := strings.TrimRight(path.Join(flags.rootPrefix, endpointPrefixes.Gemini), "/") + "/models/"
server.RegisterPrefix(geminiModelPrefix, extproc.NewGeminiProcessorFactory(generateContentMetricsFactory))Tests
TestServer_ProcessorForPath_PrefixMatch— exact match wins over prefix, longest prefix wins, unknown paths returnerrNoProcessorTestExtractGeminiModelFromPath— table-driven: standard, streaming, with root prefix, missing colon, empty stringTestNewGeminiProcessorFactory— router (non-streaming), router (streaming), upstream branch
Notes
- No changes to
filterapi, controller, or CRD schema — theGCPVertexAIbackend schema is reused as-is - The translator is a passthrough for response bodies; token usage extraction is not implemented (Gemini native responses use a different token field structure — can be a follow-up)
- Streaming detection uses the
:streamGenerateContentmethod suffix in the URL path, not a body field
cc @mathetake @yuzisun — would love your thoughts on the prefix dispatch approach and whether RegisterPrefix belongs on Server or should be a separate routing layer.