feat: add vertexai inference provider#33
Conversation
WalkthroughAdds Vertex AI as a remote inference provider in distribution configs and docs, and adds Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant App
participant Router
participant VertexProvider as Vertex AI Provider
participant VertexAPI as Vertex AI API
User->>App: Submit inference request
App->>Router: Select inference provider
alt VERTEX_AI_PROJECT set
Router->>VertexProvider: Route request to Vertex AI
Note right of VertexProvider #E8F8F5: Container image includes `google-cloud-aiplatform`
VertexProvider->>VertexAPI: Call Vertex AI (project/location)
VertexAPI-->>VertexProvider: Return result
VertexProvider-->>Router: Forward result
else Not set
Router->>App: Use other configured provider
end
Router-->>App: Return result
App-->>User: Render output
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
distribution/build.yaml (1)
39-43: Source of truth: include Vertex AI deps here, not in Containerfile.To persist across regenerations, add the new packages to additional_pip_packages (and then revert the Containerfile changes).
Suggested addition (outside the changed hunk, in additional_pip_packages):
additional_pip_packages: - aiosqlite - sqlalchemy[asyncio] - asyncpg - psycopg2-binary - google-cloud-aiplatform - litellm
🧹 Nitpick comments (3)
distribution/Containerfile (1)
17-17: Pin or bound google-cloud-aiplatform to avoid upstream breakage.Unpinned aiplatform can pull breaking transitive updates (grpc, google-auth). After moving to the generator, add a conservative upper-bound (<2) or a known-good pin that matches the provider in llama-stack 0.2.x.
distribution/run.yaml (2)
45-49: Add usage docs/env guidance for ADC.Make it explicit that GOOGLE_APPLICATION_CREDENTIALS (or GCP Workload Identity) must be provided at runtime; otherwise calls will 401/403. A brief comment in README or deployment notes is sufficient.
162-173: No model mapped to the new provider.As-is, the vertexai provider won’t be used unless a model entry points to it. Consider adding a sample model bound to provider_id vertexai and gate it via env vars (similar to vllm/milvus patterns).
Example (outside the changed hunk):
models: - metadata: {} model_id: ${env.VERTEX_MODEL:=} provider_id: ${env.VERTEX_AI_PROJECT:+vertexai} provider_model_id: ${env.VERTEX_PROVIDER_MODEL_ID:=} # e.g., publishers/google/models/gemini-1.5-pro model_type: llm
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
distribution/Containerfile(1 hunks)distribution/build.yaml(1 hunks)distribution/run.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
🔇 Additional comments (4)
distribution/Containerfile (1)
20-20: Verify litellm/openai interplay.litellm brings its own OpenAI dependency; you also install openai separately. Please confirm version compatibility to avoid resolver conflicts or shadowed imports at runtime.
distribution/build.yaml (2)
10-10: LGTM: provider type added.Adding remote::vertexai to the distribution spec looks good.
38-38: Python base mismatch: 3.11 here vs 3.12 in Containerfile.The spec references python-311, while the generated Containerfile uses python-312. Please align to avoid subtle ABI/package issues.
distribution/run.yaml (1)
45-49: Confirm llama-stack 0.2.21 includes remote::vertexai.If the provider landed after 0.2.21, initialization will fail. Please verify against the installed llama-stack version or bump accordingly.
leseb
left a comment
There was a problem hiding this comment.
holding this one for downstream first
ae61a99 to
b2dac1f
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
distribution/run.yaml (1)
45-49: Gate Vertex AI provider behind env to avoid startup failures without GCP creds.Mirror the existing Milvus pattern so the provider is only registered when VERTEX_AI_PROJECT is set. This prevents init errors when ADC/project are absent.
Apply:
- - provider_id: vertexai + - provider_id: ${env.VERTEX_AI_PROJECT:+vertexai} provider_type: remote::vertexai config: project: ${env.VERTEX_AI_PROJECT:=} location: ${env.VERTEX_AI_LOCATION:=us-central1}Please verify boot succeeds both with and without VERTEX_AI_PROJECT set.
🧹 Nitpick comments (1)
distribution/run.yaml (1)
45-49: Optional: align provider_id naming with others (e.g., “-inference”).Most inference providers use a “-inference” suffix (e.g., vllm-inference, bedrock-inference). Consider renaming for consistency.
If you adopt the env‑gated form above, change just the suffix:
- - provider_id: ${env.VERTEX_AI_PROJECT:+vertexai} + - provider_id: ${env.VERTEX_AI_PROJECT:+vertexai-inference}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
distribution/Containerfile(1 hunks)distribution/README.md(1 hunks)distribution/build.yaml(1 hunks)distribution/run.yaml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- distribution/README.md
- distribution/Containerfile
- distribution/build.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
b2dac1f to
efc7ed8
Compare
d931c79 to
6869e23
Compare
6869e23 to
8c8522b
Compare
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
8c8522b to
d2b6a0b
Compare
chore: bump wheel release to get 0.3.5.1
What does this PR do?
Adds VertexAI provider to the midstream distro image
Test Plan
No testing at this time
Summary by CodeRabbit
New Features
Chores
Documentation