feat: add GCP Vertex AI image generation support#1905
feat: add GCP Vertex AI image generation support#1905altale wants to merge 10 commits intoenvoyproxy:mainfrom
Conversation
4d1cd25 to
4e806ce
Compare
f7bfead to
509b0bc
Compare
Add support for translating OpenAI /v1/images/generations requests to GCP Vertex AI image generation APIs with two translation paths: - Imagen models (imagen-*): uses predict endpoint with parameter mapping for aspect ratio, image size, output format, and compression quality. - Gemini image models: uses generateContent endpoint. Includes 19 unit tests and 3 end-to-end integration tests. Supersedes envoyproxy#1806 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: zhihao.zhao <zhihao.zhao@lycorp.co.jp>
- preserve explicit output_compression=0 for Imagen requests - enforce cross-provider size/quality/output_format validation - keep backend default behavior for size="" and size="auto" - fix Gemini image usage metrics (input/output/total token usage) - update and deduplicate translator tests and data-plane fixture Signed-off-by: zhihao.zhao <zhihao.zhao@lycorp.co.jp>
b2a6670 to
dfb946e
Compare
Signed-off-by: zhihao.zhao <zhihao.zhao@lycorp.co.jp>
Signed-off-by: zhihao.zhao <zhihao.zhao@lycorp.co.jp>
|
/gemini summary |
Summary of ChangesThis pull request introduces a new feature that enables the AI Gateway to translate OpenAI-compatible image generation requests to GCP Vertex AI, supporting both Imagen and Gemini models. It includes request and response translation logic, size and format handling, and error management to ensure compatibility and proper functionality with the Vertex AI service. Highlights
Changelog
Activity
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for GCP Vertex AI image generation, including both Imagen and Gemini models. The changes are well-structured, with a new translator to handle the conversion from OpenAI's API format to GCP's. The implementation correctly distinguishes between the two backend paths (predict for Imagen, generateContent for Gemini) and handles parameter mapping and response translation. The addition of comprehensive unit and end-to-end tests ensures the new functionality is robust. I have one suggestion to refactor a part of the code for better readability.
| o.isImagenModel = strings.HasPrefix(o.requestModel, "imagen") | ||
| var path string | ||
| if o.isImagenModel { | ||
| var imgGenReq *gcp.ImagePredictRequest | ||
| imgGenReq, err = openAIToImagenRequest(openAIReq) | ||
| if err != nil { | ||
| return | ||
| } | ||
| newBody, err = json.Marshal(imgGenReq) | ||
| path = buildGCPModelPathSuffix(gcpModelPublisherGoogle, string(o.requestModel), gcpMethodPredict) | ||
| } else { | ||
| var geminiReq *gcp.GenerateContentRequest | ||
| geminiReq, err = openAIToGeminiRequest(openAIReq) | ||
| if err != nil { | ||
| return | ||
| } | ||
| newBody, err = json.Marshal(geminiReq) | ||
| path = buildGCPModelPathSuffix(gcpModelPublisherGoogle, string(o.requestModel), gcpMethodGenerateContent) | ||
| } | ||
| if err != nil { | ||
| err = fmt.Errorf("failed to encode request: %w", err) | ||
| return | ||
| } |
There was a problem hiding this comment.
This block can be refactored to improve readability and separate the logic for creating the backend request from the JSON marshaling. By using an any type for the backend request, you can avoid duplicating the marshaling and error handling logic.
o.isImagenModel = strings.HasPrefix(o.requestModel, "imagen")
var path string
var backendReq any
if o.isImagenModel {
backendReq, err = openAIToImagenRequest(openAIReq)
if err != nil {
return
}
path = buildGCPModelPathSuffix(gcpModelPublisherGoogle, string(o.requestModel), gcpMethodPredict)
} else {
backendReq, err = openAIToGeminiRequest(openAIReq)
if err != nil {
return
}
path = buildGCPModelPathSuffix(gcpModelPublisherGoogle, string(o.requestModel), gcpMethodGenerateContent)
}
newBody, err = json.Marshal(backendReq)
if err != nil {
err = fmt.Errorf("failed to encode request: %w", err)
return
}There was a problem hiding this comment.
I prefer to keep the explicit typed branches here (ImagePredictRequest vs GenerateContentRequest) for clarity and type safety.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1905 +/- ##
==========================================
+ Coverage 84.22% 84.32% +0.09%
==========================================
Files 128 129 +1
Lines 17828 17992 +164
==========================================
+ Hits 15016 15171 +155
- Misses 1868 1874 +6
- Partials 944 947 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: zhihao.zhao <zhihao.zhao@lycorp.co.jp>
d177345 to
36d3ed0
Compare
|
Hi @nacx. Update on the earlier CI failure in E2E Test for Inference Extensions. I re-ran the same changes in my own fork CI, and that job passed successfully: Also, the scope of this PR is unrelated to the inference-extension e2e logic (it mainly touches Vertex AI image generation translation paths and related tests/docs, not the inference-extension controller/e2e framework itself). Could you please help re-run the failed upstream job for confirmation? Thank you so much. |
Description
This PR adds OpenAI-compatible image generation translation for GCP Vertex AI with two backend paths:
Related Issues/PRs (if applicable)
#1806
Special notes for reviewers (if applicable)
Behavior Summary