-
Notifications
You must be signed in to change notification settings - Fork 905
fix(models): raise ContextWindowOverflowException for ollama/llama/mistral/writer #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,7 +15,7 @@ | |
| from typing_extensions import Unpack, override | ||
|
|
||
| from ..types.content import ContentBlock, Messages | ||
| from ..types.exceptions import ModelThrottledException | ||
| from ..types.exceptions import ContextWindowOverflowException, ModelThrottledException | ||
| from ..types.streaming import StreamEvent | ||
| from ..types.tools import ToolChoice, ToolResult, ToolSpec, ToolUse | ||
| from ._validation import _has_location_source, validate_config_keys, warn_on_tool_choice_not_supported | ||
|
|
@@ -29,6 +29,15 @@ | |
| class WriterModel(Model): | ||
| """Writer API model provider implementation.""" | ||
|
|
||
| OVERFLOW_MESSAGES = { | ||
| "this model's maximum context length is", | ||
| "exceed context limit", | ||
| "model's maximum context limit", | ||
| "is longer than the model's context length", | ||
| "prompt is too long", | ||
| "too many tokens", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Issue: This Suggestion: If LlamaAPI and Writer genuinely share the same OpenAI-compatible error vocabulary, consider hoisting the shared strings to a single module-level constant (mirroring how |
||
| } | ||
|
|
||
| class WriterConfig(BaseModelConfig, total=False): | ||
| """Configuration options for Writer API. | ||
|
|
||
|
|
@@ -397,6 +406,10 @@ async def stream( | |
| response = await self.client.chat.chat(**request) | ||
| except writerai.RateLimitError as e: | ||
| raise ModelThrottledException(str(e)) from e | ||
| except writerai.BadRequestError as e: | ||
| if any(message in str(e).lower() for message in self.OVERFLOW_MESSAGES): | ||
| raise ContextWindowOverflowException(str(e)) from e | ||
| raise | ||
|
|
||
| yield self.format_chunk({"chunk_type": "message_start"}) | ||
| yield self.format_chunk({"chunk_type": "content_block_start", "data_type": "text"}) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue (verification): The substring matching is the crux of this fix — if these strings don't match what the provider actually returns, the
exceptbranch becomes silent dead code and the bug persists undetected. The unit tests assert against these same hardcoded strings, so they'd pass even if the strings are wrong.Suggestion: Confirm these strings against real provider error text (an integ test that intentionally overflows the context, or a captured real error in the PR description). Worth noting where each string came from, since several read like reasonable guesses (e.g.
"input too large","too many tokens").