Skip to content

Commit b6cc231

Browse files
authored
Fix Vertex AI model routing and v1beta1 for preview models (#29)
* fix: Include model name in Vertex AI request params and use v1beta1 for preview models Three fixes for Vertex AI provider: 1. Include "model" key in build_request_params output so chat/chat_stream use the correct model name in URL construction instead of falling back to the default "gemini-2.0-flash" 2. Use v1beta1 API version for preview/experimental models which require it (v1 returns 404 for these models) 3. Add debug logging for request URLs to aid troubleshooting * chore: Bump version to 0.12.7, add changelog entry Changelog documents the Vertex AI model routing fix and v1beta1 support for preview/experimental models. * fix: Correct changelog version numbers (0.13.x → 0.12.x) * fix: Vertex AI v1/v1beta1 bug, global endpoint support, input validation The v1beta1 fix from 584f569 had a critical bug: Model.parse stored a hardcoded v1 URL in model.base_url when GOOGLE_CLOUD_PROJECT was set, bypassing the provider's v1beta1 selection logic. Preview models still 404'd in the most common setup. Fix: default_base_url(:vertex_ai) now returns nil — URL is built at request time by the provider with proper v1/v1beta1 selection. Also adds: - Global endpoint support (required for Gemini 3.x preview models) - GOOGLE_CLOUD_LOCATION env var as fallback for GOOGLE_CLOUD_REGION - Input validation for project ID and region with helpful error messages - Comprehensive docs (README + moduledoc) with model table, setup guide - Integration test exercising Flash + Pro on global endpoint - Multi-region example script Tested: 782 tests, 0 failures. Integration-tested with service account against gemini-3.1-pro-preview and gemini-3-flash-preview on global endpoint.
1 parent a172877 commit b6cc231

File tree

9 files changed

+818
-109
lines changed

9 files changed

+818
-109
lines changed

CHANGELOG.md

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,32 @@
22

33
All notable changes to this project will be documented in this file.
44

5-
## [0.13.2] - 2026-03-07
5+
## [0.12.8] - 2026-03-12
6+
7+
### Fixed
8+
9+
- **Vertex AI v1/v1beta1 bug**: `Model.parse("vertex_ai:gemini-2.5-pro-preview-06-05")` with `GOOGLE_CLOUD_PROJECT` set was storing a hardcoded `v1` URL in `model.base_url`, causing the provider's `v1beta1` selection logic to be bypassed. Preview models now correctly use `v1beta1` at request time.
10+
11+
### Added
12+
13+
- **Vertex AI input validation**: Project ID and region from environment variables are now validated with helpful error messages instead of producing opaque DNS/HTTP errors.
14+
- **`GOOGLE_CLOUD_LOCATION` support**: Added as a fallback for `GOOGLE_CLOUD_REGION`, consistent with other Google Cloud libraries and tooling.
15+
- Multi-region example script: `examples/providers/vertex_ai_multi_region.exs`
16+
17+
## [0.12.7] - 2026-03-10
18+
19+
### Fixed
20+
21+
- **Vertex AI model routing**: Fixed `build_request_params/3` not including the `"model"` key in the params map, causing `chat/2` and `chat_stream/2` to always fall back to `"gemini-2.0-flash"` regardless of the requested model.
22+
- **Vertex AI 404 on preview models**: Use `v1beta1` API version for preview and experimental models (e.g., `gemini-3.1-pro-preview`). The `v1` endpoint returns 404 for these models.
23+
24+
### Added
25+
26+
- `Nous.Providers.VertexAI.api_version_for_model/1` — returns `"v1beta1"` for preview/experimental models, `"v1"` for stable models.
27+
- `Nous.Providers.VertexAI.endpoint/3` now accepts an optional model name to select the correct API version.
28+
- Debug logging for Vertex AI request URLs.
29+
30+
## [0.12.6] - 2026-03-07
631

732
### Added
833

@@ -12,7 +37,7 @@ All notable changes to this project will be documented in this file.
1237
- New config options: `:auto_update_memory`, `:auto_update_every`, `:reflection_model`, `:reflection_max_tokens`, `:reflection_max_messages`, `:reflection_max_memories`
1338
- New example: `examples/memory/auto_update.exs`
1439

15-
## [0.13.1] - 2026-03-06
40+
## [0.12.5] - 2026-03-06
1641

1742
### Added
1843

README.md

Lines changed: 92 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ IO.puts("Tokens: #{result.usage.total_tokens}")
9393
| OpenAI | `openai:gpt-4` ||
9494
| Anthropic | `anthropic:claude-sonnet-4-5-20250929` ||
9595
| Google Gemini | `gemini:gemini-2.0-flash` ||
96-
| Google Vertex AI | `vertex_ai:gemini-2.0-flash` ||
96+
| Google Vertex AI | `vertex_ai:gemini-3.1-pro-preview` ||
9797
| Groq | `groq:llama-3.1-70b-versatile` ||
9898
| Ollama | `ollama:llama2` ||
9999
| OpenRouter | `openrouter:anthropic/claude-3.5-sonnet` ||
@@ -108,15 +108,46 @@ All HTTP providers use pure Elixir HTTP clients (Req + Finch). LlamaCpp runs in-
108108
agent = Nous.new("lmstudio:qwen3") # Local (free)
109109
agent = Nous.new("openai:gpt-4") # OpenAI
110110
agent = Nous.new("anthropic:claude-sonnet-4-5-20250929") # Anthropic
111-
agent = Nous.new("vertex_ai:gemini-2.0-flash") # Google Vertex AI
111+
agent = Nous.new("vertex_ai:gemini-3.1-pro-preview") # Google Vertex AI
112112
agent = Nous.new("llamacpp:local", llamacpp_model: llm) # Local NIF
113113
```
114114

115115
### Google Vertex AI Setup
116116

117-
Vertex AI provides enterprise access to Gemini models. To use it with a service account:
117+
Vertex AI provides enterprise access to Gemini models via Google Cloud. It supports
118+
VPC-SC, CMEK, IAM, regional/global endpoints, and all the latest Gemini models.
118119

119-
**1. Create a service account:**
120+
#### Supported Models
121+
122+
| Model | Model ID | Endpoint | API Version |
123+
|-------|----------|----------|-------------|
124+
| Gemini 3.1 Pro (preview) | `gemini-3.1-pro-preview` | global only | v1beta1 |
125+
| Gemini 3 Flash (preview) | `gemini-3-flash-preview` | global only | v1beta1 |
126+
| Gemini 3.1 Flash-Lite (preview) | `gemini-3.1-flash-lite-preview` | global only | v1beta1 |
127+
| Gemini 2.5 Pro | `gemini-2.5-pro` | regional + global | v1 |
128+
| Gemini 2.5 Flash | `gemini-2.5-flash` | regional + global | v1 |
129+
| Gemini 2.0 Flash | `gemini-2.0-flash` | regional + global | v1 |
130+
131+
> **Note:** Preview and experimental models automatically use the `v1beta1` API version.
132+
> The Gemini 3.x preview models are **global endpoint only** — set `GOOGLE_CLOUD_LOCATION=global`.
133+
134+
#### Regional vs Global Endpoints
135+
136+
Vertex AI offers two endpoint types:
137+
138+
- **Regional** (e.g., `us-central1`, `europe-west1`): Low-latency, data residency guarantees
139+
```
140+
https://us-central1-aiplatform.googleapis.com/v1/projects/{project}/locations/us-central1
141+
```
142+
- **Global**: Higher availability, required for Gemini 3.x preview models
143+
```
144+
https://aiplatform.googleapis.com/v1beta1/projects/{project}/locations/global
145+
```
146+
147+
The provider automatically selects the correct hostname and API version based on the
148+
region and model name. Set `GOOGLE_CLOUD_LOCATION=global` for Gemini 3.x preview models.
149+
150+
#### Step 1: Create a Service Account
120151

121152
```bash
122153
export PROJECT_ID="your-project-id"
@@ -129,64 +160,101 @@ gcloud iam service-accounts create nous-vertex-ai \
129160
--display-name="Nous Vertex AI" \
130161
--project=$PROJECT_ID
131162

132-
# Grant permission
163+
# Grant the Vertex AI User role
133164
gcloud projects add-iam-policy-binding $PROJECT_ID \
134165
--member="serviceAccount:nous-vertex-ai@${PROJECT_ID}.iam.gserviceaccount.com" \
135166
--role="roles/aiplatform.user"
136167

137-
# Download key and store as env var
138-
gcloud iam service-accounts keys create /tmp/sa.json \
168+
# Download the key file
169+
gcloud iam service-accounts keys create /tmp/sa-key.json \
139170
--iam-account="nous-vertex-ai@${PROJECT_ID}.iam.gserviceaccount.com"
171+
```
172+
173+
#### Step 2: Set Environment Variables
174+
175+
```bash
176+
# Load the service account JSON into an env var (recommended — no file path dependency)
177+
export GOOGLE_CREDENTIALS="$(cat /tmp/sa-key.json)"
178+
179+
# Required: your GCP project ID
180+
export GOOGLE_CLOUD_PROJECT="your-project-id"
140181

141-
# Set the env vars
142-
export GOOGLE_CREDENTIALS="$(cat /tmp/sa.json)"
143-
export GOOGLE_CLOUD_PROJECT="$PROJECT_ID"
144-
export GOOGLE_CLOUD_REGION="us-central1"
182+
# Required for Gemini 3.x preview models (global endpoint only)
183+
export GOOGLE_CLOUD_LOCATION="global"
184+
185+
# Or use a regional endpoint for stable models:
186+
# export GOOGLE_CLOUD_LOCATION="us-central1"
187+
# export GOOGLE_CLOUD_LOCATION="europe-west1"
145188
```
146189

147-
**2. Add Goth to your deps** (handles token refresh from the service account):
190+
Both `GOOGLE_CLOUD_REGION` and `GOOGLE_CLOUD_LOCATION` are supported (consistent with
191+
other Google Cloud libraries). `GOOGLE_CLOUD_REGION` takes precedence if both are set.
192+
Defaults to `us-central1` if neither is set.
193+
194+
#### Step 3: Add Goth to Your Application
195+
196+
Goth handles OAuth2 token fetching and auto-refresh from the service account credentials.
148197

149198
```elixir
199+
# mix.exs
150200
{:goth, "~> 1.4"}
151201
```
152202

153-
**3. Start Goth in your supervision tree:**
154-
155203
```elixir
204+
# application.ex — start Goth in your supervision tree
156205
credentials = System.get_env("GOOGLE_CREDENTIALS") |> Jason.decode!()
157206

158207
children = [
159208
{Goth, name: MyApp.Goth, source: {:service_account, credentials}}
160209
]
161210
```
162211

163-
**4. Configure Nous to use Goth:**
212+
#### Step 4: Configure and Use
164213

165214
```elixir
166-
# Option A: Via app config (recommended for production)
215+
# Option A: App config (recommended for production)
167216
# config/config.exs
168217
config :nous, :vertex_ai, goth: MyApp.Goth
169218

170-
# Then just use it — no extra options needed:
171-
agent = Nous.new("vertex_ai:gemini-2.0-flash")
219+
# Then use it — Goth handles token refresh automatically:
220+
agent = Nous.new("vertex_ai:gemini-3.1-pro-preview")
172221
{:ok, result} = Nous.run(agent, "Hello from Vertex AI!")
173222
```
174223

175224
```elixir
176-
# Option B: Per-model (useful for multiple projects/regions)
177-
agent = Nous.new("vertex_ai:gemini-2.0-flash",
225+
# Option B: Per-model Goth (useful for multiple projects)
226+
agent = Nous.new("vertex_ai:gemini-3-flash-preview",
178227
default_settings: %{goth: MyApp.Goth}
179228
)
180229
```
181230

182231
```elixir
183-
# Option C: Direct access token (no Goth needed, e.g. for quick testing)
184-
export VERTEX_AI_ACCESS_TOKEN="$(gcloud auth print-access-token)"
232+
# Option C: Explicit base_url (for custom endpoint or specific region)
233+
alias Nous.Providers.VertexAI
234+
235+
agent = Nous.new("vertex_ai:gemini-3.1-pro-preview",
236+
base_url: VertexAI.endpoint("my-project", "global", "gemini-3.1-pro-preview"),
237+
default_settings: %{goth: MyApp.Goth}
238+
)
239+
```
185240

186-
agent = Nous.new("vertex_ai:gemini-2.0-flash")
241+
```elixir
242+
# Option D: Quick testing with gcloud CLI (no Goth needed)
243+
# export VERTEX_AI_ACCESS_TOKEN="$(gcloud auth print-access-token)"
244+
agent = Nous.new("vertex_ai:gemini-3.1-pro-preview")
187245
```
188246

189-
See [`examples/providers/vertex_ai_goth_test.exs`](examples/providers/vertex_ai_goth_test.exs) for a runnable example.
247+
#### Input Validation
248+
249+
The provider validates `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` at request time
250+
and returns helpful error messages for invalid values instead of opaque DNS or HTTP errors.
251+
252+
#### Examples
253+
254+
- [`examples/providers/vertex_ai.exs`](examples/providers/vertex_ai.exs) — Basic usage with access token
255+
- [`examples/providers/vertex_ai_goth_test.exs`](examples/providers/vertex_ai_goth_test.exs) — Service account with Goth
256+
- [`examples/providers/vertex_ai_multi_region.exs`](examples/providers/vertex_ai_multi_region.exs) — Multi-region + v1/v1beta1 demo
257+
- [`examples/providers/vertex_ai_integration_test.exs`](examples/providers/vertex_ai_integration_test.exs) — Full integration test (Flash + Pro, streaming + non-streaming)
190258

191259
## Features
192260

examples/providers/vertex_ai_goth_test.exs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# Prerequisites:
66
# export GOOGLE_CREDENTIALS='{"type":"service_account","project_id":"...","private_key":"...",...}'
77
# export GOOGLE_CLOUD_PROJECT="your-project-id"
8-
# export GOOGLE_CLOUD_REGION="europe-west1" # optional, defaults to europe-west1 (Frankfurt)
8+
# export GOOGLE_CLOUD_REGION="us-central1" # optional, defaults to us-central1
99
#
1010
# Run:
1111
# mix run test_vertex_ai.exs
@@ -25,7 +25,7 @@ end
2525

2626
IO.puts("=== Vertex AI Test with Service Account ===\n")
2727
IO.puts("Project: #{project}")
28-
IO.puts("Region: #{System.get_env("GOOGLE_CLOUD_REGION", "europe-west1")}\n")
28+
IO.puts("Region: #{System.get_env("GOOGLE_CLOUD_REGION", "us-central1")}\n")
2929

3030
# Start Goth with service account credentials from env var
3131
credentials = Jason.decode!(credentials_json)
@@ -38,7 +38,7 @@ IO.puts("Goth started successfully.\n")
3838
IO.puts("--- Test 1: Non-streaming ---")
3939

4040
agent =
41-
Nous.new("vertex_ai:gemini-3.1-pro",
41+
Nous.new("vertex_ai:gemini-2.0-flash",
4242
instructions: "You are a helpful assistant. Be concise.",
4343
default_settings: %{goth: Nous.TestGoth}
4444
)

0 commit comments

Comments
 (0)