Skip to content

feat(models): update Gemini model context windows and output limits#602

Closed
CleoMenezesJr wants to merge 1 commit into
Gitlawb:mainfrom
CleoMenezesJr:gemini-models
Closed

feat(models): update Gemini model context windows and output limits#602
CleoMenezesJr wants to merge 1 commit into
Gitlawb:mainfrom
CleoMenezesJr:gemini-models

Conversation

@CleoMenezesJr
Copy link
Copy Markdown

@CleoMenezesJr CleoMenezesJr commented Apr 11, 2026

Summary

  • What changed: Added context window and max output token entries for native Google Gemini models in openaiContextWindows.ts — covering gemini-2.0-flash, gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash, and gemini-3.1-pro (used via CLAUDE_CODE_USE_GEMINI).
  • Why it changed: Without these entries, the system falls back to an 8k default context window. This causes calculateTokenWarningState to signal a blocking limit at session start, triggering auto-compaction before any real context is consumed.

Impact

  • user-facing: Gemini sessions no longer compact prematurely. Token warning thresholds now reflect the actual model limits (e.g., 1M tokens for Flash, 2M for Gemini 3.1 Pro).
  • developer/maintainer: Pure data addition — no logic changed. Entries follow the existing ascending-version ordering used for other provider families in the same file.

Testing

  • bun run build
  • bun run smoke
  • focused tests: launch with bun run dev:gemini using gemini-2.5-pro or gemini-3-flash, send a message, and confirm that the token indicator reflects the correct context window instead of triggering an immediate compaction prompt.

Notes

  • provider/model path tested: CLAUDE_CODE_USE_GEMINI=1 with gemini-2.5-pro and gemini-3-flash via native endpoint.
  • screenshots attached (if UI changed): N/A
  • follow-up work or known limitations: Output token limits for gemini-3-flash and gemini-2.0-flash are set conservatively at 8k. These should be updated if Google raises the limits in a future release.

@kevincodex1 kevincodex1 requested a review from auriti April 12, 2026 07:23
kevincodex1
kevincodex1 previously approved these changes Apr 12, 2026
Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed on head d5f5ce7 — adds context window and max output token entries for native Gemini models. CI green ✅

The intent is right — without these entries, native Gemini sessions fall back to 8k context and trigger premature compaction. But two of the five values need correction based on Google's official specs:

🔴 Incorrect values:

  1. gemini-3.1-pro context window: 2,097,152 (2M) — Google's official model card states: "token context window of up to 1M". Should be 1,048,576.

  2. gemini-3-flash max output tokens: 8,192 — Per sim.ai model tracking and Google's API docs, Gemini 3 Flash supports up to 65,536 output tokens (same as 2.5 Pro and 2.5 Flash). Setting this to 8k would unnecessarily limit responses from the model.

✅ Correct values:

Model Context Max Output Source
gemini-2.0-flash 1,048,576 8,192 Correct per Google specs
gemini-2.5-flash 1,048,576 65,536 Correct
gemini-2.5-pro 1,048,576 65,536 Correct (was already in file, just reordered)
gemini-3-flash 1,048,576 8,19265,536 Needs fix
gemini-3.1-pro 2,097,1521,048,576 65,536 Needs fix

Verdict: Needs changes — two values don't match Google's published specs. Once corrected, this is approve-ready.

Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t approve this yet. The direction is right, but some of the Gemini context-window and output-limit values still need correction before merge. Since this data feeds runtime limit handling, I’d want the numbers fixed first, then I’m happy to recheck.

Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on head 7def95e — both blockers from the previous review are fixed ✅

Model Context Max Output Status
gemini-2.0-flash 1,048,576 8,192
gemini-2.5-flash 1,048_576 65,536
gemini-2.5-pro 1,048,576 65,536
gemini-3-flash 1,048,576 65,536 ✅ (was 8,192)
gemini-3.1-pro 1,048,576 65,536 ✅ (was 2,097,152)

All values now match Google's published model cards. CI green. Pure data addition, no logic changes. Approved ✅

@kevincodex1
Copy link
Copy Markdown
Contributor

@gnanam1990 kindly have a look again bro

Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still request changes.

bun run build passes on the current head, but I’m still not comfortable approving the exact Gemini metadata values being added here. This PR changes runtime context-window and max-output tables, and those numbers directly feed warning thresholds, blocking behavior, and auto-compact decisions. Before merging, I’d want the added Gemini limits verified against a clear source of truth or tightened so we’re not shipping incorrect runtime metadata.

Copy link
Copy Markdown
Collaborator

@auriti auriti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the missing Gemini native context windows — this is a real problem (the 8k fallback triggers premature compaction).

However, most of this PR is already covered by current main and by #783 (which adds the same entries plus OpenRouter variants and output token fallbacks). Here's the overlap:

Model main (v0.5.2) PR #783 This PR
gemini-2.0-flash
gemini-2.5-flash
gemini-2.5-pro
gemini-3.1-pro
gemini-3-flash
gemini-3-flash-preview
gemini-3.1-pro-preview
google/gemini-3* (OpenRouter)

The only entry unique to this PR is gemini-3-flash (without -preview). A couple of notes on that:

  1. Naming: Google's current API uses gemini-3-flash-preview — the -preview suffix is required until the model reaches GA. gemini-3-flash without the suffix will likely not match any model ID today and would be a no-op entry. When Google promotes it to GA, the name may change — at which point we'd add it.

  2. Context window values look correct: 1M for both Flash and Pro, 65k output tokens. These match the official specs.

Suggestion: If #783 lands first, this PR would be fully superseded. If you'd like to contribute the gemini-3-flash (GA name) entry as a forward-looking addition, I'd suggest rebasing on top of #783 and adding just that one entry — but only after confirming the model ID works against the native Gemini endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants