Skip to content

feat: add vertexai inference provider#33

Merged
nathan-weinberg merged 1 commit intoopendatahub-io:mainfrom
nathan-weinberg:vertexai
Sep 18, 2025
Merged

feat: add vertexai inference provider#33
nathan-weinberg merged 1 commit intoopendatahub-io:mainfrom
nathan-weinberg:vertexai

Conversation

@nathan-weinberg
Copy link
Copy Markdown
Collaborator

@nathan-weinberg nathan-weinberg commented Sep 15, 2025

What does this PR do?

Adds VertexAI provider to the midstream distro image

Test Plan

No testing at this time

Summary by CodeRabbit

  • New Features

    • Added support for Google Cloud Vertex AI as a remote inference provider; configurable via environment variables for project and location (default: us-central1).
  • Chores

    • Added the Google Cloud AI client library to the runtime image dependencies; may modestly increase build time and image size.
  • Documentation

    • Updated provider listings and README to include the new Vertex AI option.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

Adds Vertex AI as a remote inference provider in distribution configs and docs, and adds google-cloud-aiplatform to the distribution container image's pip install block; no runtime entrypoint or exported declarations were changed.

Changes

Cohort / File(s) Summary of Changes
Container image deps
distribution/Containerfile
Added google-cloud-aiplatform to the existing pip install RUN block; litellm remains present and unchanged.
Inference provider config
distribution/build.yaml, distribution/run.yaml
Added Vertex AI provider: inserted - provider_type: remote::vertexai in distribution/build.yaml; added conditional provider entry in distribution/run.yaml with provider_id: ${env.VERTEX_AI_PROJECT:+vertexai}, provider_type: remote::vertexai, and config.project / config.location using ${env.VERTEX_AI_PROJECT:=} and ${env.VERTEX_AI_LOCATION:=us-central1}.
Docs
distribution/README.md
Added `inference

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant App
    participant Router
    participant VertexProvider as Vertex AI Provider
    participant VertexAPI as Vertex AI API

    User->>App: Submit inference request
    App->>Router: Select inference provider
    alt VERTEX_AI_PROJECT set
        Router->>VertexProvider: Route request to Vertex AI
        Note right of VertexProvider #E8F8F5: Container image includes `google-cloud-aiplatform`
        VertexProvider->>VertexAPI: Call Vertex AI (project/location)
        VertexAPI-->>VertexProvider: Return result
        VertexProvider-->>Router: Forward result
    else Not set
        Router->>App: Use other configured provider
    end
    Router-->>App: Return result
    App-->>User: Render output
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hop through builds with tiny feet,
A new cloud friend makes my route complete,
Vertex whispers in the sky so high,
Packages snug, we give tests a try,
🥕 — from a rabbit who helped deploy.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: add vertexai inference provider" concisely and accurately describes the primary change in the PR — adding a Vertex AI (vertexai) inference provider across the distribution (configs, README, and container). It is specific, short, and follows conventional commit style, so a reviewer can quickly understand the main intent.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c8522b and d2b6a0b.

📒 Files selected for processing (4)
  • distribution/Containerfile (1 hunks)
  • distribution/README.md (1 hunks)
  • distribution/build.yaml (1 hunks)
  • distribution/run.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • distribution/run.yaml
  • distribution/build.yaml
  • distribution/Containerfile
  • distribution/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-push (linux/amd64)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
distribution/build.yaml (1)

39-43: Source of truth: include Vertex AI deps here, not in Containerfile.

To persist across regenerations, add the new packages to additional_pip_packages (and then revert the Containerfile changes).

Suggested addition (outside the changed hunk, in additional_pip_packages):

additional_pip_packages:
  - aiosqlite
  - sqlalchemy[asyncio]
  - asyncpg
  - psycopg2-binary
  - google-cloud-aiplatform
  - litellm
🧹 Nitpick comments (3)
distribution/Containerfile (1)

17-17: Pin or bound google-cloud-aiplatform to avoid upstream breakage.

Unpinned aiplatform can pull breaking transitive updates (grpc, google-auth). After moving to the generator, add a conservative upper-bound (<2) or a known-good pin that matches the provider in llama-stack 0.2.x.

distribution/run.yaml (2)

45-49: Add usage docs/env guidance for ADC.

Make it explicit that GOOGLE_APPLICATION_CREDENTIALS (or GCP Workload Identity) must be provided at runtime; otherwise calls will 401/403. A brief comment in README or deployment notes is sufficient.


162-173: No model mapped to the new provider.

As-is, the vertexai provider won’t be used unless a model entry points to it. Consider adding a sample model bound to provider_id vertexai and gate it via env vars (similar to vllm/milvus patterns).

Example (outside the changed hunk):

models:
  - metadata: {}
    model_id: ${env.VERTEX_MODEL:=}
    provider_id: ${env.VERTEX_AI_PROJECT:+vertexai}
    provider_model_id: ${env.VERTEX_PROVIDER_MODEL_ID:=}  # e.g., publishers/google/models/gemini-1.5-pro
    model_type: llm
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 756fad8 and 8838451.

📒 Files selected for processing (3)
  • distribution/Containerfile (1 hunks)
  • distribution/build.yaml (1 hunks)
  • distribution/run.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-push (linux/amd64)
🔇 Additional comments (4)
distribution/Containerfile (1)

20-20: Verify litellm/openai interplay.

litellm brings its own OpenAI dependency; you also install openai separately. Please confirm version compatibility to avoid resolver conflicts or shadowed imports at runtime.

distribution/build.yaml (2)

10-10: LGTM: provider type added.

Adding remote::vertexai to the distribution spec looks good.


38-38: Python base mismatch: 3.11 here vs 3.12 in Containerfile.

The spec references python-311, while the generated Containerfile uses python-312. Please align to avoid subtle ABI/package issues.

distribution/run.yaml (1)

45-49: Confirm llama-stack 0.2.21 includes remote::vertexai.

If the provider landed after 0.2.21, initialization will fail. Please verify against the installed llama-stack version or bump accordingly.

Copy link
Copy Markdown
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holding this one for downstream first

@nathan-weinberg nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Sep 15, 2025
@nathan-weinberg nathan-weinberg force-pushed the vertexai branch 4 times, most recently from ae61a99 to b2dac1f Compare September 16, 2025 20:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
distribution/run.yaml (1)

45-49: Gate Vertex AI provider behind env to avoid startup failures without GCP creds.

Mirror the existing Milvus pattern so the provider is only registered when VERTEX_AI_PROJECT is set. This prevents init errors when ADC/project are absent.

Apply:

-  - provider_id: vertexai
+  - provider_id: ${env.VERTEX_AI_PROJECT:+vertexai}
     provider_type: remote::vertexai
     config:
       project: ${env.VERTEX_AI_PROJECT:=}
       location: ${env.VERTEX_AI_LOCATION:=us-central1}

Please verify boot succeeds both with and without VERTEX_AI_PROJECT set.

🧹 Nitpick comments (1)
distribution/run.yaml (1)

45-49: Optional: align provider_id naming with others (e.g., “-inference”).

Most inference providers use a “-inference” suffix (e.g., vllm-inference, bedrock-inference). Consider renaming for consistency.

If you adopt the env‑gated form above, change just the suffix:

-  - provider_id: ${env.VERTEX_AI_PROJECT:+vertexai}
+  - provider_id: ${env.VERTEX_AI_PROJECT:+vertexai-inference}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae61a99 and b2dac1f.

📒 Files selected for processing (4)
  • distribution/Containerfile (1 hunks)
  • distribution/README.md (1 hunks)
  • distribution/build.yaml (1 hunks)
  • distribution/run.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • distribution/README.md
  • distribution/Containerfile
  • distribution/build.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-push (linux/amd64)

@nathan-weinberg nathan-weinberg removed the do-not-merge Apply to PRs that should not be merged (yet) label Sep 18, 2025
@nathan-weinberg nathan-weinberg force-pushed the vertexai branch 2 times, most recently from d931c79 to 6869e23 Compare September 18, 2025 15:05
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
@nathan-weinberg nathan-weinberg merged commit 0512b5d into opendatahub-io:main Sep 18, 2025
5 checks passed
@nathan-weinberg nathan-weinberg deleted the vertexai branch September 18, 2025 18:17
@coderabbitai coderabbitai bot mentioned this pull request Sep 19, 2025
leseb added a commit to leseb/llama-stack-distribution that referenced this pull request Jan 9, 2026
chore: bump wheel release to get 0.3.5.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants