-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
Summary
When using Gemini 3 models (gemini-3-flash-preview, gemini-3-pro-preview) with LangExtract, the API calls timeout because Gemini 3 defaults to thinking_level: high which adds significant latency. There's currently no way to pass thinking_config to reduce this latency.
Problem
The _API_CONFIG_KEYS allowlist in langextract/providers/gemini.py (line 40-48) does not include thinking_config:
_API_CONFIG_KEYS: Final[set[str]] = {
'response_mime_type',
'response_schema',
'safety_settings',
'system_instruction',
'tools',
'stop_sequences',
'candidate_count',
}This means any thinking_config passed via language_model_params gets filtered out (line 186-188):
self._extra_kwargs = {
k: v for k, v in (kwargs or {}).items() if k in _API_CONFIG_KEYS
}Impact
- Gemini 3 Flash (designed for speed) times out on simple extraction tasks
- Users cannot set
thinking_level: "minimal"to reduce latency - Forces users to use older models like
gemini-2.5-flashinstead of newer Gemini 3 models
Proposed Solution
Add thinking_config to _API_CONFIG_KEYS:
_API_CONFIG_KEYS: Final[set[str]] = {
'response_mime_type',
'response_schema',
'safety_settings',
'system_instruction',
'tools',
'stop_sequences',
'candidate_count',
'thinking_config', # Add this for Gemini 3 support
}This would allow users to pass:
result = lx.extract(
text_or_documents=text,
prompt_description=prompt,
examples=examples,
model_id="gemini-3-flash-preview",
language_model_params={
"thinking_config": {"thinking_level": "minimal"}
}
)References
Environment
- LangExtract version: 1.1.1
- Python: 3.12
- Models tested:
gemini-3-flash-preview,gemini-3-pro-preview
Metadata
Metadata
Assignees
Labels
No labels