-
Notifications
You must be signed in to change notification settings - Fork 143
[BUG] Litellm: max_tokens no supported #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates model references, test expectations, and LLM token-handling logic to support new LLM variants and token limit parameters.
- Tests now import and use
get_gpt_o4_model,get_gpt_mini_model,get_gemini_flash_model, and adjust expected values for outputs. - The LLM class gains
DEFAULT_OUTPUT_TOKENS,thinking_token_limit, and refactors thinking-mode parameter handling, replacingasyncioloop management withasyncio.run. - Global model getters are extended, and the extractor handles dict sources via
DocumentLoaderData.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_process.py | Switched to direct model string in splitter and adjusted expected age. |
| tests/test_extractor.py | Expanded model getters, renamed budget fields, added thinking-mode tests, and updated assertions. |
| extract_thinker/llm.py | Introduced default output tokens, thinking-token-limit, and refactored async and thinking-parameter flows. |
| extract_thinker/global_models.py | Added new model getters for GPT and Gemini Flash models. |
| extract_thinker/extractor.py | Auto-loads DocumentLoaderData when source is a dict. |
| extract_thinker/concatenation_handler.py | Added guard against empty responses in JSON continuation check. |
Comments suppressed due to low confidence (4)
tests/test_extractor.py:310
- The assertion uses
threshold=0.8(80%) but the error message says 90%. Update the message to match the actual threshold.
assert semantically_similar(
result_1.title,
result_2.title,
threshold=0.8
), "Titles are not semantically similar enough (threshold: 90%)"
extract_thinker/global_models.py:9
- [nitpick] Both
get_lite_model()andget_big_model()return the same string; if they are intended to be different, adjust accordingly or clarify why they match.
return "gemini/gemini-2.5-flash-preview-05-20"
extract_thinker/extractor.py:219
- Ensure
DocumentLoaderDatais imported at the top of the module; otherwise, this will raise aNameErrorat runtime.
self.document_loader = DocumentLoaderData()
extract_thinker/llm.py:175
- Confirm that
import asyncioexists at module scope. If it was removed, calls toasyncio.runwill fail withNameError.
result = asyncio.run(
| #TODO handbook is 403 error now. So we need to refactor the test to use a different url. | ||
|
|
||
| # Check handbook data | ||
| assert "FCA Handbook" in result.title, f"Expected title to contain 'FCA Handbook', but got: {result.title}" | ||
| #assert "FCA Handbook" in result.title, f"Expected title to contain 'FCA Handbook', but got: {result.title}" | ||
| assert result.title == "OpenAI" |
Copilot
AI
Jun 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Remove or resolve this TODO before merging to avoid lingering technical debt in tests.
| if __name__ == "__main__": | ||
| test_chart_with_content() |
Copilot
AI
Jun 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This __main__ block and manual test call are unnecessary for pytest; consider removing it.
| if __name__ == "__main__": | |
| test_chart_with_content() | |
| # Removed the __main__ block and manual test call. |
| # Arrange | ||
| process, classifications = setup_process_and_classifications() | ||
| process.load_splitter(ImageSplitter(get_big_model())) | ||
| process.load_splitter(ImageSplitter("gemini/gemini-2.5-flash-preview-05-20")) |
Copilot
AI
Jun 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Rather than hard-coding the model string, use get_big_model() for consistency and easier future updates.
| process.load_splitter(ImageSplitter("gemini/gemini-2.5-flash-preview-05-20")) | |
| process.load_splitter(ImageSplitter(get_big_model())) |
|
|
||
| def _request_with_router(self, messages: List[Dict[str, str]], response_model: Optional[str]) -> Any: | ||
| """Handle request using router with or without thinking parameter""" | ||
| max_tokens = self.DEFAULT_OUTPUT_TOKENS |
Copilot
AI
Jun 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The logic to compute max_tokens is duplicated across several methods; consider extracting it into a helper to reduce duplication.
No description provided.