GoogleLLMService: apply model-aware default thinking config for Gemini 3+

### pipecat version

0.0.104

### Python version

3.12

### Operating System

macOS

### Issue description

`GoogleLLMService` currently applies a default thinking override only for `gemini-2.5-flash*` via `_maybe_unset_thinking_budget()` in `pipecat/services/google/llm.py`.

That works for Gemini 2.5 Flash, but Gemini 3+ uses a different thinking surface (`thinking_level`, not `thinking_budget`). If a caller uses `GoogleLLMService` or `GoogleVertexLLMService` with a Gemini 3 Flash model and does not pass an explicit `params.thinking`, Pipecat leaves `thinking_config` unset and the model falls back to provider defaults.

For latency-sensitive voice use cases this is surprising because:

- `gemini-2.5-flash*` gets an automatic low-latency default (`thinking_budget=0`)
- `gemini-3*` gets no automatic default at all, even though the model family requires a different control surface

This is especially easy to hit now that newer Gemini 3 / 3.1 Flash models are becoming default choices.

Related Google docs:
- https://ai.google.dev/gemini-api/docs/thinking

### Reproduction steps

1. Initialize `GoogleLLMService` or `GoogleVertexLLMService` with a Gemini 3 Flash model, for example `gemini-3.1-flash-lite-preview`.
2. Do not pass `params.thinking`.
3. Inspect the generation params or run a normal call.
4. Observe that Pipecat does not populate `thinking_config` at all for Gemini 3 models.

Minimal example:

```python
from pipecat.services.google.llm_vertex import GoogleVertexLLMService

llm = GoogleVertexLLMService(
    credentials_path="...",
    project_id="...",
    location="europe-west2",
    model="gemini-3.1-flash-lite-preview",
    system_instruction="You are a voice assistant.",
)

params = llm._build_generation_params(system_instruction="x", tools=[], tool_config=None)
llm._maybe_unset_thinking_budget(params)
print(params)
# No thinking_config is added for Gemini 3 models.
```

### Expected behavior

If the caller did not specify `params.thinking`, Pipecat should apply model-aware defaults:

- `gemini-2.5-flash*` -> `thinking_budget=0`, `include_thoughts=False`
- `gemini-3-flash*` / `gemini-3.1-flash*` -> `thinking_level="minimal"`, `include_thoughts=False`
- do not override explicit user-provided thinking config
- probably leave Pro-family models untouched unless there is a clear safe default

### Actual behavior

Only the Gemini 2.5 Flash family gets a default low-thinking config. Gemini 3+ models get provider-default thinking behavior unless every application remembers to configure it explicitly.

### Suggested fix

Replace `_maybe_unset_thinking_budget()` with a model-aware default-thinking helper, for example:

- detect the model family
- apply the appropriate control surface for that family
- skip entirely when `thinking_config` is already present

This would make latency behavior much more predictable across Gemini model upgrades.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GoogleLLMService: apply model-aware default thinking config for Gemini 3+ #3993

pipecat version

Python version

Operating System

Issue description

Reproduction steps

Expected behavior

Actual behavior

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GoogleLLMService: apply model-aware default thinking config for Gemini 3+ #3993

Description

pipecat version

Python version

Operating System

Issue description

Reproduction steps

Expected behavior

Actual behavior

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions