feat(models): update Gemini model context windows and output limits#602
feat(models): update Gemini model context windows and output limits#602CleoMenezesJr wants to merge 1 commit into
Conversation
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Reviewed on head d5f5ce7 — adds context window and max output token entries for native Gemini models. CI green ✅
The intent is right — without these entries, native Gemini sessions fall back to 8k context and trigger premature compaction. But two of the five values need correction based on Google's official specs:
🔴 Incorrect values:
-
gemini-3.1-procontext window: 2,097,152 (2M) — Google's official model card states: "token context window of up to 1M". Should be 1,048,576. -
gemini-3-flashmax output tokens: 8,192 — Per sim.ai model tracking and Google's API docs, Gemini 3 Flash supports up to 65,536 output tokens (same as 2.5 Pro and 2.5 Flash). Setting this to 8k would unnecessarily limit responses from the model.
✅ Correct values:
| Model | Context | Max Output | Source |
|---|---|---|---|
gemini-2.0-flash |
1,048,576 | 8,192 | Correct per Google specs |
gemini-2.5-flash |
1,048,576 | 65,536 | Correct |
gemini-2.5-pro |
1,048,576 | 65,536 | Correct (was already in file, just reordered) |
gemini-3-flash |
1,048,576 | Needs fix | |
gemini-3.1-pro |
65,536 | Needs fix |
Verdict: Needs changes — two values don't match Google's published specs. Once corrected, this is approve-ready.
gnanam1990
left a comment
There was a problem hiding this comment.
I can’t approve this yet. The direction is right, but some of the Gemini context-window and output-limit values still need correction before merge. Since this data feeds runtime limit handling, I’d want the numbers fixed first, then I’m happy to recheck.
bbf4f56 to
7def95e
Compare
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Re-reviewed on head 7def95e — both blockers from the previous review are fixed ✅
| Model | Context | Max Output | Status |
|---|---|---|---|
gemini-2.0-flash |
1,048,576 | 8,192 | ✅ |
gemini-2.5-flash |
1,048_576 | 65,536 | ✅ |
gemini-2.5-pro |
1,048,576 | 65,536 | ✅ |
gemini-3-flash |
1,048,576 | 65,536 | ✅ (was 8,192) |
gemini-3.1-pro |
1,048,576 | 65,536 | ✅ (was 2,097,152) |
All values now match Google's published model cards. CI green. Pure data addition, no logic changes. Approved ✅
|
@gnanam1990 kindly have a look again bro |
gnanam1990
left a comment
There was a problem hiding this comment.
Still request changes.
bun run build passes on the current head, but I’m still not comfortable approving the exact Gemini metadata values being added here. This PR changes runtime context-window and max-output tables, and those numbers directly feed warning thresholds, blocking behavior, and auto-compact decisions. Before merging, I’d want the added Gemini limits verified against a clear source of truth or tightened so we’re not shipping incorrect runtime metadata.
auriti
left a comment
There was a problem hiding this comment.
Thanks for adding the missing Gemini native context windows — this is a real problem (the 8k fallback triggers premature compaction).
However, most of this PR is already covered by current main and by #783 (which adds the same entries plus OpenRouter variants and output token fallbacks). Here's the overlap:
| Model | main (v0.5.2) | PR #783 | This PR |
|---|---|---|---|
gemini-2.0-flash |
✅ | ✅ | ✅ |
gemini-2.5-flash |
✅ | ✅ | ✅ |
gemini-2.5-pro |
✅ | ✅ | ✅ |
gemini-3.1-pro |
✅ | ✅ | ✅ |
gemini-3-flash |
❌ | ❌ | ✅ |
gemini-3-flash-preview |
❌ | ✅ | ❌ |
gemini-3.1-pro-preview |
❌ | ✅ | ❌ |
google/gemini-3* (OpenRouter) |
❌ | ✅ | ❌ |
The only entry unique to this PR is gemini-3-flash (without -preview). A couple of notes on that:
-
Naming: Google's current API uses
gemini-3-flash-preview— the-previewsuffix is required until the model reaches GA.gemini-3-flashwithout the suffix will likely not match any model ID today and would be a no-op entry. When Google promotes it to GA, the name may change — at which point we'd add it. -
Context window values look correct: 1M for both Flash and Pro, 65k output tokens. These match the official specs.
Suggestion: If #783 lands first, this PR would be fully superseded. If you'd like to contribute the gemini-3-flash (GA name) entry as a forward-looking addition, I'd suggest rebasing on top of #783 and adding just that one entry — but only after confirming the model ID works against the native Gemini endpoint.
Summary
openaiContextWindows.ts— coveringgemini-2.0-flash,gemini-2.5-flash,gemini-2.5-pro,gemini-3-flash, andgemini-3.1-pro(used viaCLAUDE_CODE_USE_GEMINI).calculateTokenWarningStateto signal a blocking limit at session start, triggering auto-compaction before any real context is consumed.Impact
Testing
bun run buildbun run smokebun run dev:geminiusinggemini-2.5-proorgemini-3-flash, send a message, and confirm that the token indicator reflects the correct context window instead of triggering an immediate compaction prompt.Notes
CLAUDE_CODE_USE_GEMINI=1withgemini-2.5-proandgemini-3-flashvia native endpoint.gemini-3-flashandgemini-2.0-flashare set conservatively at 8k. These should be updated if Google raises the limits in a future release.