-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
pipecat version
0.0.104
Python version
3.12
Operating System
macOS
Issue description
GoogleLLMService currently applies a default thinking override only for gemini-2.5-flash* via _maybe_unset_thinking_budget() in pipecat/services/google/llm.py.
That works for Gemini 2.5 Flash, but Gemini 3+ uses a different thinking surface (thinking_level, not thinking_budget). If a caller uses GoogleLLMService or GoogleVertexLLMService with a Gemini 3 Flash model and does not pass an explicit params.thinking, Pipecat leaves thinking_config unset and the model falls back to provider defaults.
For latency-sensitive voice use cases this is surprising because:
gemini-2.5-flash*gets an automatic low-latency default (thinking_budget=0)gemini-3*gets no automatic default at all, even though the model family requires a different control surface
This is especially easy to hit now that newer Gemini 3 / 3.1 Flash models are becoming default choices.
Related Google docs:
Reproduction steps
- Initialize
GoogleLLMServiceorGoogleVertexLLMServicewith a Gemini 3 Flash model, for examplegemini-3.1-flash-lite-preview. - Do not pass
params.thinking. - Inspect the generation params or run a normal call.
- Observe that Pipecat does not populate
thinking_configat all for Gemini 3 models.
Minimal example:
from pipecat.services.google.llm_vertex import GoogleVertexLLMService
llm = GoogleVertexLLMService(
credentials_path="...",
project_id="...",
location="europe-west2",
model="gemini-3.1-flash-lite-preview",
system_instruction="You are a voice assistant.",
)
params = llm._build_generation_params(system_instruction="x", tools=[], tool_config=None)
llm._maybe_unset_thinking_budget(params)
print(params)
# No thinking_config is added for Gemini 3 models.Expected behavior
If the caller did not specify params.thinking, Pipecat should apply model-aware defaults:
gemini-2.5-flash*->thinking_budget=0,include_thoughts=Falsegemini-3-flash*/gemini-3.1-flash*->thinking_level="minimal",include_thoughts=False- do not override explicit user-provided thinking config
- probably leave Pro-family models untouched unless there is a clear safe default
Actual behavior
Only the Gemini 2.5 Flash family gets a default low-thinking config. Gemini 3+ models get provider-default thinking behavior unless every application remembers to configure it explicitly.
Suggested fix
Replace _maybe_unset_thinking_budget() with a model-aware default-thinking helper, for example:
- detect the model family
- apply the appropriate control surface for that family
- skip entirely when
thinking_configis already present
This would make latency behavior much more predictable across Gemini model upgrades.