You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -238,21 +238,197 @@ Grounding is invoked by passing the `google_search` tool (for newer models like
238
238
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
239
239
</Warning>
240
240
241
-
## gemini-2.0-flash-thinking-exp and other thinking models
241
+
## thinking models
242
242
243
-
`gemini-2.0-flash-thinking-exp` models return a Chain of Thought response along with the actual inference text,
244
-
this is not openai compatible, however, Portkey supports this by adding a `\r\n\r\n` and appending the two responses together.
245
-
You can split the response along this pattern to get the Chain of Thought response and the actual inference text.
246
-
247
-
If you require the Chain of Thought response along with the actual inference text, pass the [strict open ai compliance flag](/product/ai-gateway/strict-open-ai-compliance) as `false` in the request.
248
-
249
-
If you want to get the inference text only, pass the [strict open ai compliance flag](/product/ai-gateway/strict-open-ai-compliance) as `true` in the request.
243
+
<CodeGroup>
244
+
```py Python
245
+
from portkey_ai import Portkey
246
+
247
+
# Initialize the Portkey client
248
+
portkey = Portkey(
249
+
api_key="PORTKEY_API_KEY", # Replace with your Portkey API key
250
+
virtual_key="VIRTUAL_KEY", # Add your provider's virtual key
251
+
strict_open_ai_compliance=False
252
+
)
253
+
254
+
# Create the request
255
+
response = portkey.chat.completions.create(
256
+
model="gemini-2.5-flash-preview-04-17",
257
+
max_tokens=3000,
258
+
thinking={
259
+
"type": "enabled",
260
+
"budget_tokens": 2030
261
+
},
262
+
stream=True,
263
+
messages=[
264
+
{
265
+
"role": "user",
266
+
"content": [
267
+
{
268
+
"type": "text",
269
+
"text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
270
+
}
271
+
]
272
+
}
273
+
]
274
+
)
275
+
print(response)
276
+
```
277
+
```ts NodeJS
278
+
importPortkeyfrom'portkey-ai';
279
+
280
+
// Initialize the Portkey client
281
+
const portkey =newPortkey({
282
+
apiKey: "PORTKEY_API_KEY", // Replace with your Portkey API key
283
+
virtualKey: "VIRTUAL_KEY", // your vertex-ai virtual key
You can manage all prompts to Google Gemini in the [Prompt Library](/product/prompt-library). All the current models of Google Gemini are supported and you can easily start testing different prompts.
370
+
response = openai.chat.completions.create(
371
+
model="gemini-2.5-flash-preview-04-17",
372
+
max_tokens=3000,
373
+
thinking={
374
+
"type": "enabled",
375
+
"budget_tokens": 2030
376
+
},
377
+
stream=True,
378
+
messages=[
379
+
{
380
+
"role": "user",
381
+
"content": [
382
+
{
383
+
"type": "text",
384
+
"text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
The assistants thinking response is returned in the `response_chunk.choices[0].delta.content_blocks` array, not the `response.choices[0].message.content` string.
267
+
268
+
Gemini models do no return their chain-of-thought-messages, so content_blocks are not required for Gemini models.
267
269
</Note>
268
270
269
-
Models like `anthropic.claude-3-7-sonnet@20250219` support [extended thinking](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#claude-3-7-sonnet).
271
+
Models like `google.gemini-2.5-flash-preview-04-17``anthropic.claude-3-7-sonnet@20250219` support [extended thinking](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#claude-3-7-sonnet).
270
272
This is similar to openai thinking, but you get the model's reasoning as it processes the request as well.
271
273
272
274
Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-gateway/strict-open-ai-compliance) in the headers to use this feature.
@@ -484,6 +486,16 @@ Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-g
484
486
```
485
487
</CodeGroup>
486
488
489
+
<Note>
490
+
To disable thinking for gemini models like `google.gemini-2.5-flash-preview-04-17`, you are required to explicitly set `budget_tokens` to `0` and `type` to `disabled`.
491
+
```json
492
+
"thinking": {
493
+
"type": "disabled",
494
+
"budget_tokens": 0
495
+
}
496
+
```
497
+
</Note>
498
+
487
499
### Multi turn conversation
488
500
489
501
<CodeGroup>
@@ -737,19 +749,18 @@ Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-g
737
749
```
738
750
</CodeGroup>
739
751
740
-
<Note>
741
-
This same message format also works for all other media types — just send your media file in the `url` field, like `"url": "gs://cloud-samples-data/video/animals.mp4"` for google cloud urls and `"url":"https://download.samplelib.com/mp3/sample-3s.mp3"` for public urls
742
-
743
-
Your URL should have the file extension, this is used for inferring `MIME_TYPE` which is a required parameter for prompting Gemini models with files
744
-
</Note>
745
-
746
752
### Sending `base64` Image
747
753
748
754
Here, you can send the `base64` image data along with the `url` field too: 
This same message format also works for all other media types — just send your media file in the `url` field, like `"url": "gs://cloud-samples-data/video/animals.mp4"` for google cloud urls and `"url":"https://download.samplelib.com/mp3/sample-3s.mp3"` for public urls
761
+
762
+
Your URL should have the file extension, this is used for inferring `MIME_TYPE` which is a required parameter for prompting Gemini models with files
0 commit comments