Skip to content

Commit a636795

Browse files
Merge pull request #323 from Portkey-AI/fix/vertex-thinking
update docs on how to disable thinking for gemini models
2 parents 857c493 + 6038898 commit a636795

File tree

2 files changed

+205
-18
lines changed

2 files changed

+205
-18
lines changed

integrations/llms/gemini.mdx

Lines changed: 187 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -238,21 +238,197 @@ Grounding is invoked by passing the `google_search` tool (for newer models like
238238
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
239239
</Warning>
240240

241-
## gemini-2.0-flash-thinking-exp and other thinking models
241+
## thinking models
242242

243-
`gemini-2.0-flash-thinking-exp` models return a Chain of Thought response along with the actual inference text,
244-
this is not openai compatible, however, Portkey supports this by adding a `\r\n\r\n` and appending the two responses together.
245-
You can split the response along this pattern to get the Chain of Thought response and the actual inference text.
246-
247-
If you require the Chain of Thought response along with the actual inference text, pass the [strict open ai compliance flag](/product/ai-gateway/strict-open-ai-compliance) as `false` in the request.
248-
249-
If you want to get the inference text only, pass the [strict open ai compliance flag](/product/ai-gateway/strict-open-ai-compliance) as `true` in the request.
243+
<CodeGroup>
244+
```py Python
245+
from portkey_ai import Portkey
246+
247+
# Initialize the Portkey client
248+
portkey = Portkey(
249+
api_key="PORTKEY_API_KEY", # Replace with your Portkey API key
250+
virtual_key="VIRTUAL_KEY", # Add your provider's virtual key
251+
strict_open_ai_compliance=False
252+
)
253+
254+
# Create the request
255+
response = portkey.chat.completions.create(
256+
model="gemini-2.5-flash-preview-04-17",
257+
max_tokens=3000,
258+
thinking={
259+
"type": "enabled",
260+
"budget_tokens": 2030
261+
},
262+
stream=True,
263+
messages=[
264+
{
265+
"role": "user",
266+
"content": [
267+
{
268+
"type": "text",
269+
"text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
270+
}
271+
]
272+
}
273+
]
274+
)
275+
print(response)
276+
```
277+
```ts NodeJS
278+
import Portkey from 'portkey-ai';
279+
280+
// Initialize the Portkey client
281+
const portkey = new Portkey({
282+
apiKey: "PORTKEY_API_KEY", // Replace with your Portkey API key
283+
virtualKey: "VIRTUAL_KEY", // your vertex-ai virtual key
284+
strictOpenAiCompliance: false
285+
});
286+
287+
// Generate a chat completion
288+
async function getChatCompletionFunctions() {
289+
const response = await portkey.chat.completions.create({
290+
model: "gemini-2.5-flash-preview-04-17",
291+
max_tokens: 3000,
292+
thinking: {
293+
type: "enabled",
294+
budget_tokens: 2030
295+
},
296+
stream: true,
297+
messages: [
298+
{
299+
role: "user",
300+
content: [
301+
{
302+
type: "text",
303+
text: "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
304+
}
305+
]
306+
}
307+
]
308+
});
309+
console.log(response);
310+
311+
// Call the function
312+
getChatCompletionFunctions();
313+
```
314+
```js OpenAI NodeJS
315+
import OpenAI from 'openai'; // We're using the v4 SDK
316+
import { PORTKEY_GATEWAY_URL, createHeaders } from 'portkey-ai'
317+
318+
const openai = new OpenAI({
319+
apiKey: 'VERTEX_API_KEY', // defaults to process.env["OPENAI_API_KEY"],
320+
baseURL: PORTKEY_GATEWAY_URL,
321+
defaultHeaders: createHeaders({
322+
provider: "vertex-ai",
323+
apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]
324+
strictOpenAiCompliance: false
325+
})
326+
});
327+
328+
// Generate a chat completion with streaming
329+
async function getChatCompletionFunctions(){
330+
const response = await openai.chat.completions.create({
331+
model: "gemini-2.5-flash-preview-04-17",
332+
max_tokens: 3000,
333+
thinking: {
334+
type: "enabled",
335+
budget_tokens: 2030
336+
},
337+
stream: true,
338+
messages: [
339+
{
340+
role: "user",
341+
content: [
342+
{
343+
type: "text",
344+
text: "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
345+
}
346+
]
347+
}
348+
],
349+
});
350+
351+
console.log(response)
352+
}
353+
await getChatCompletionFunctions();
354+
```
355+
```py OpenAI Python
356+
from openai import OpenAI
357+
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
358+
359+
openai = OpenAI(
360+
api_key='VERTEX_API_KEY',
361+
base_url=PORTKEY_GATEWAY_URL,
362+
default_headers=createHeaders(
363+
provider="vertex-ai",
364+
api_key="PORTKEY_API_KEY",
365+
strict_open_ai_compliance=False
366+
)
367+
)
250368

251-
## Managing Google Gemini Prompts
252369

253-
You can manage all prompts to Google Gemini in the [Prompt Library](/product/prompt-library). All the current models of Google Gemini are supported and you can easily start testing different prompts.
370+
response = openai.chat.completions.create(
371+
model="gemini-2.5-flash-preview-04-17",
372+
max_tokens=3000,
373+
thinking={
374+
"type": "enabled",
375+
"budget_tokens": 2030
376+
},
377+
stream=True,
378+
messages=[
379+
{
380+
"role": "user",
381+
"content": [
382+
{
383+
"type": "text",
384+
"text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
385+
}
386+
]
387+
}
388+
]
389+
)
390+
391+
print(response)
392+
```
393+
```sh cURL
394+
curl "https://api.portkey.ai/v1/chat/completions" \
395+
-H "Content-Type: application/json" \
396+
-H "x-portkey-api-key: $PORTKEY_API_KEY" \
397+
-H "x-portkey-provider: vertex-ai" \
398+
-H "x-api-key: $VERTEX_API_KEY" \
399+
-H "x-portkey-strict-open-ai-compliance: false" \
400+
-d '{
401+
"model": "gemini-2.5-flash-preview-04-17",
402+
"max_tokens": 3000,
403+
"thinking": {
404+
"type": "enabled",
405+
"budget_tokens": 2030
406+
},
407+
"stream": true,
408+
"messages": [
409+
{
410+
"role": "user",
411+
"content": [
412+
{
413+
"type": "text",
414+
"text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
415+
}
416+
]
417+
}
418+
]
419+
}'
420+
```
421+
</CodeGroup>
254422

255-
Once you're ready with your prompt, you can use the `portkey.prompts.completions.create` interface to use the prompt in your application.
423+
<Note>
424+
To disable thinking for gemini models like `google.gemini-2.5-flash-preview-04-17`, you are required to explicitly set `budget_tokens` to `0` and `type` to `disabled`.
425+
```json
426+
"thinking": {
427+
"type": "disabled",
428+
"budget_tokens": 0
429+
}
430+
```
431+
</Note>
256432

257433
<Info>
258434
Gemini grounding mode may not work via Portkey SDK. Contact support@portkey.ai for assistance.

integrations/llms/vertex-ai.mdx

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -264,9 +264,11 @@ curl --location 'https://api.portkey.ai/v1/chat/completions' \
264264

265265
<Note>
266266
The assistants thinking response is returned in the `response_chunk.choices[0].delta.content_blocks` array, not the `response.choices[0].message.content` string.
267+
268+
Gemini models do no return their chain-of-thought-messages, so content_blocks are not required for Gemini models.
267269
</Note>
268270

269-
Models like `anthropic.claude-3-7-sonnet@20250219` support [extended thinking](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#claude-3-7-sonnet).
271+
Models like `google.gemini-2.5-flash-preview-04-17` `anthropic.claude-3-7-sonnet@20250219` support [extended thinking](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#claude-3-7-sonnet).
270272
This is similar to openai thinking, but you get the model's reasoning as it processes the request as well.
271273

272274
Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-gateway/strict-open-ai-compliance) in the headers to use this feature.
@@ -484,6 +486,16 @@ Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-g
484486
```
485487
</CodeGroup>
486488

489+
<Note>
490+
To disable thinking for gemini models like `google.gemini-2.5-flash-preview-04-17`, you are required to explicitly set `budget_tokens` to `0` and `type` to `disabled`.
491+
```json
492+
"thinking": {
493+
"type": "disabled",
494+
"budget_tokens": 0
495+
}
496+
```
497+
</Note>
498+
487499
### Multi turn conversation
488500

489501
<CodeGroup>
@@ -737,19 +749,18 @@ Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-g
737749
```
738750
</CodeGroup>
739751

740-
<Note>
741-
This same message format also works for all other media types — just send your media file in the `url` field, like `"url": "gs://cloud-samples-data/video/animals.mp4"` for google cloud urls and `"url":"https://download.samplelib.com/mp3/sample-3s.mp3"` for public urls
742-
743-
Your URL should have the file extension, this is used for inferring `MIME_TYPE` which is a required parameter for prompting Gemini models with files
744-
</Note>
745-
746752
### Sending `base64` Image
747753

748754
Here, you can send the `base64` image data along with the `url` field too:&#x20;
749755

750756
```json
751757
"url": "....."
752758
```
759+
<Note>
760+
This same message format also works for all other media types — just send your media file in the `url` field, like `"url": "gs://cloud-samples-data/video/animals.mp4"` for google cloud urls and `"url":"https://download.samplelib.com/mp3/sample-3s.mp3"` for public urls
761+
762+
Your URL should have the file extension, this is used for inferring `MIME_TYPE` which is a required parameter for prompting Gemini models with files
763+
</Note>
753764

754765
## Text Embedding Models
755766

0 commit comments

Comments
 (0)